SQL business case
SQL business case
Dear Stakeholders,
I hope this message finds you well. I am writing to address a significant undertaking that has the
potential to unlock valuable insights and provide actionable recommendations for Target's
operations in Brazil. I have been assigned the task of analyzing an extensive dataset
encompassing 100,000 orders placed between 2016 and 2018. This dataset offers a
comprehensive view of various dimensions, including order status, pricing, payment and freight
performance, customer location, product attributes, and customer reviews.
Our goal in undertaking this analysis is to gain a deeper understanding of Target's operations in
Brazil, with a focus on several crucial aspects of the business. By meticulously examining this
dataset, we aim to extract meaningful insights and offer actionable recommendations that can
enhance our strategies, drive operational efficiency, and elevate the overall guest experience.
Ultimately, the objective of this endeavor is to empower our stakeholders with evidence-backed
knowledge, enabling them to make informed decisions and drive Target's success in Brazil. We
aim to provide you with:
I look forward to embarking on this analysis and delivering valuable insights that will
contribute to the growth and prosperity of Target in Brazil.
Should you have any questions or suggestions throughout this process, please do not
hesitate to reach out.
SELECT column_name,data_type
FROM sqlcase1-
target.Brazil_Market.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'customers';
b. Get the time range between which the orders were placed.
select min(order_purchase_timestamp) as
First_order_timestamp,max(order_purchase_timestamp) as
Last_order_timestamp
from `Brazil_Market.orders`;
select count(distinct(geolocation_city)) as
Number_of_Cities,count(distinct(geolocation_state)) as
Number_of_States
from `Brazil_Market.geolocation`
2. In-depth Exploration
a. Is there a growing trend in the no. of orders placed over the past years?
Insights:
select Order_year,Order_Month,Number_of_Orders,
ntile(5) over(order by Number_of_orders desc) as
Months_Seasonality
from
(SELECT EXTRACT(YEAR FROM order_purchase_timestamp) AS
Order_Year,
EXTRACT(MONTH FROM order_purchase_timestamp) AS
Order_Month,COUNT(order_id) AS Number_of_Orders,
FROM `Brazil_Market.orders`
GROUP BY order_year,order_month
ORDER BY order_year,order_month) tbl
order by Number_of_Orders desc;
Insights:
With cte as
(select order_id, order_purchase_timestamp,
case when extract(hour from order_purchase_timestamp)
between 0 and 6 then 'Dawn (0-6)'
when extract(hour from order_purchase_timestamp) between 7
and 12 then 'Morning (7-12)'
when extract(hour from order_purchase_timestamp) between 13
and 18 then 'Afternoon (13-18)'
when extract(hour from order_purchase_timestamp) between 19
and 23 then 'Night (19-23)'
end as Order_time_of_day
from `Brazil_Market.orders`)
select Order_time_of_day,
count(*) as Number_of_orders
from cte
group by Order_time_of_day
order by Number_of_orders desc;
Insights:
➢ Afternoon time (13-18 hrs) is the time where most orders are
placed followed by night time (19-23 hrs) closely followed by
morning time (7-12 hrs).
➢ Dawn (0-6 hrs) is the least preferred time for the customers to place
orders
3. Evolution of E-commerce orders in the Brazil region
select c.customer_state,
extract (year from o.order_purchase_timestamp) as order_year,
extract (month from o.order_purchase_timestamp) as
order_month,count(o.order_id) as Number_of_orders
from `Brazil_Market.orders` o join `Brazil_Market.customers` c
on o.customer_id=c.customer_id
group by c.customer_state,order_year,order_month
order by c.customer_state,order_year,order_month;
Insights:
Insights:
a. Get the % increase in the cost of orders from year 2017 to 2018 (include months
between Jan to Aug only)
with y17 as
(select round(sum(p.payment_value)) as cost_of_orders_2017
from `Brazil_Market.orders` o join
`Brazil_Market.payments` p on p.order_id=o.order_id
where extract (year from o.order_purchase_timestamp)= 2017 and
extract (month from o.order_purchase_timestamp) between 1 and 8
),
y18 as
(select round(sum(p.payment_value)) as cost_of_orders_2018
from `Brazil_Market.orders` o join
`Brazil_Market.payments` p on p.order_id=o.order_id
where extract (year from o.order_purchase_timestamp)= 2018 and
extract (month from o.order_purchase_timestamp) between 1 and 8
)
select cost_of_orders_2017,y18.cost_of_orders_2018,
concat(round(((y18.cost_of_orders_2018-
y17.cost_of_orders_2017)/y17.cost_of_orders_2017)*100),'%') as
Percent_increase
from y17,y18;
Insights:
Insights:
➢ State code PB has got the highest average order price with value
of 191 which shows the potential purchasing power of that state.
This potential can be used to pour in new products and services
for that market.
➢ State code SP has got the lowest average order price of 110
which could infer the low purchasing power in that state or we
don’t have products or services relevant to that market.
c. Calculate the Total & Average value of order freight for each state.
Insights:
➢ The data shows the total and average freight value for each state.
➢ Which furthers down to understand how efficient is one state to
other w.r.t cost involved in delivering the goods to customers.
➢ Required actions to be taken to improve freight cost of the states
wherever its higher than expected.
5. Analysis based on sales, freight and delivery time.
a. Find the no. of days taken to deliver each order from the order’s purchase date as
delivery time. Also, calculate the difference (in days) between the estimated &
actual delivery date of an order.
select
order_id,date_diff(order_delivered_customer_date,order_purchase_time
stamp,day) as Time_to_deliver_days,
date_diff(order_estimated_delivery_date,order_delivered_customer_dat
e,day) as diff_estimated_delivery
from `Brazil_Market.orders`
order by order_id;
Insights:
➢ The data shows the difference between actual delivery time versus
estimated delivery time of an order.
➢ The difference signifies the initiatives to be taken in the direction of
improving this gap whether to increase seller base in those
locations or maintaing enough stock at the local warehouse.
b. Find out the top 5 states with the highest & lowest average freight value.
WITH ranked_data AS (
SELECT c.customer_state, ROUND(AVG(oi.freight_value)) AS
avg_freight_value,
ROW_NUMBER() OVER (ORDER BY AVG(oi.freight_value) DESC) AS rank_high,
ROW_NUMBER() OVER (ORDER BY AVG(oi.freight_value) ASC) AS rank_low
FROM `Brazil_Market.customers` c
JOIN `Brazil_Market.orders` o ON c.customer_id = o.customer_id
JOIN `Brazil_Market.order_items` oi ON o.order_id = oi.order_id
GROUP BY c.customer_state
)
select rd_high.customer_state as highest_state,
rd_high.avg_freight_value as highest_avg_freight,
rd_low.customer_state as lowest_state, rd_low.avg_freight_value as
lowest_avg_freight
from ranked_data rd_high
join ranked_data rd_low on rd_high.rank_high = rd_low.rank_low
where rd_high.rank_high <= 5
order by rd_high.rank_high;
Insights:
WITH a AS
(select
customer_id,date_diff(order_delivered_customer_date,order_purchase_time
stamp,day) as Delivery_time
from `Brazil_Market.orders`
),
ranked_data AS (
SELECT c.customer_state, ROUND(AVG(a.Delivery_time)) AS
avg_delivery_time,
ROW_NUMBER() OVER (ORDER BY AVG(a.Delivery_time) DESC) AS rank_high,
ROW_NUMBER() OVER (ORDER BY AVG(a.Delivery_time) ASC) AS rank_low
FROM `Brazil_Market.customers` c
JOIN a ON a.customer_id = c.customer_id
GROUP BY c.customer_state
)
SELECT rd_high.customer_state AS highest_state,
rd_high.avg_delivery_time AS highest_avg_delivery_time,
rd_low.customer_state AS lowest_state, rd_low.avg_delivery_time AS
lowest_avg_delivery_time
FROM ranked_data rd_high
JOIN ranked_data rd_low ON rd_high.rank_high = rd_low.rank_low
WHERE rd_high.rank_high <= 5
ORDER BY rd_high.rank_high;
Insights:
➢ The data shows the difference between actual delivery time versus
estimated delivery time of an order.
➢ The states with high average delivery time are the ones which need
attention w.r.t resources and planning in those states. If not
improved can negatively impact customer experience.
➢ The states with low average delivery time can be the model to
those with high delivery time. Strategy can be devised to further
bring it down w.r.t best in the industry.
d. Find out the top 5 states where the order delivery is really fast as compared to the
estimated date of delivery.
with cte as
(select customer_id,order_id,
date_diff(order_estimated_delivery_date,order_delivered_custom
er_date,day) as diff_estimated_delivery
from `Brazil_Market.orders`
where order_delivered_customer_date is not null or
order_status= 'delivered')
select
c.customer_state,round(avg(ct.diff_estimated_delivery),1) as
Avg_delivery_time_day
from cte ct join
`Brazil_Market.customers` c on c.customer_id=ct.customer_id
group by c.customer_state
order by avg_delivery_time_day asc
limit 5
Insights:
➢ The data display the states where the order delivery is faster with
reference to estimated time of delivery
➢ The data displays the states with most efficient delivery time.
6. Analysis based on the payments
a. Find the month on month no. of orders placed using different payment types.
Insights:
select payment_installments,count(order_id) as
Number_of_orders
from `Brazil_Market.payments`
where payment_installments>=1
group by payment_installments
order by payment_installments
Insights:
➢ The data shows the payments made basis the number of installments
paid till that time period.
➢ As observed, the maximum number of orders belong to the first
installment paid by the customers.
ADDITIONAL QUESTIONS BY ME
with cte as
(select c.customer_state,p.product_category,count(*) as
No_of_orders
from `Brazil_Market.customers` c join
`Brazil_Market.orders` o on c.customer_id=o.customer_id join
`Brazil_Market.order_items` oi on o.order_id=oi.order_id join
`Brazil_Market.products` p on oi.product_id=p.product_id
group by c.customer_state,p.product_category
order by c.customer_state,p.product_category,No_of_orders desc
)
select customer_state,product_category,No_of_orders,top_category
from
(select customer_state,product_category,No_of_orders,
dense_rank() over(partition by customer_state order by
No_of_orders desc) as top_category
from cte
)tbl
where top_category<=3
order by customer_state,No_of_orders desc,top_category asc
Insights:
With a as
(select c.customer_state, count(r.review_id) as Lowest_score
from `Brazil_Market.customers` c join
`Brazil_Market.orders` o on c.customer_id=o.customer_id join
`Brazil_Market.order_reviews` r on o.order_id=r.order_id
where r.review_score<3
group by c.customer_state),
b as
(select c.customer_state, count(r.review_id) as Total_score
from `Brazil_Market.customers` c join
`Brazil_Market.orders` o on c.customer_id=o.customer_id join
`Brazil_Market.order_reviews` r on o.order_id=r.order_id
group by c.customer_state)
select a.customer_state,
round((a.Lowest_score/b.Total_score)*100,2) as lowscore_percent
from a join b on a.customer_state=b.customer_state
order by lowscore_percent desc
limit 10
Insights:
❖ With the monthly seasonality report it can be oberved the potential of sales during
ending of 2017 and starting of 2018. We can capitalize on our strengths and put some
more marketing effort to boost sales during these potential months.
❖ Need to prepare all teams and channel partners, upkeep SKUs and stock to set
oursleves ready for these potential months.
❖ For the months not doing well, need to understand the reasons whether its internal or
external. If internal, we need to identify the gap and gear up in terms of marketing
efforts or pricing of products or service related issues..
❖ If the reasons are external, need to identify them and understand what customer wants
during those times whether its related to products or services. Since we already have a
channel in place, we can always come up with some innovative service or product to
reach out to our customers.Which would help us stay relevant in the market at any
point of time and will help build confidence and trust in market.
❖ For ex. State code RJ shows good order numbers in 2017-2018. We can plan to
allocate resources accordingly and ensure our Cost to Profit ratio is under check.
❖ A strategy needs to be planned for the states showing least numbers in order to
increase our footprint there.
❖ There are specific months during which the sale in a particular state is very high.
Possiblities can be festival or other reasons. We will need to ensure our preparation in
terms of stock keeping and resources during those times and in that state.
❖ For the states having less customers, a strategy can be figured in order to
increase customer base in those states.
❖ For the states having high customer base, cost to profit ratio can be checked
while improving profits and reducing identified overheads