Brazilian Retailer Case Study
Brazilian Retailer Case Study
Import the dataset and do usual exploratory analysis steps like checking the structure &
characteristics of the dataset:
Answer:
Insights:
We see that customer_id, customer_unique_id, customer_city, customer_state are
“STRING” Data Type and customer_zip_code_prefix is of “INTEGER” Data Type.
1.B: Get the time range between which the orders were placed.
Answer:
Insights:
From the data set, we see that the first order was made in 04.09.2016 and last order
was made in 17.10.2018.
1.C. Count the Cities & States of customers who ordered during the given period.
Answer:
Insights:
Customers who ordered are from 4119 cities in 27 states.
2. In-depth Exploration:
2.A. Is there a growing trend in the no. of orders placed over the past years?
Answer:
Insights:
Yes, there a growing trend in the no. of orders placed over the past years.
2.B. Can we see some kind of monthly seasonality in terms of the no. of orders being placed?
Answer:
Insights:
Yes, we can see some kind of monthly seasonality in “Jan 2018 and March 2018” and “Feb
2018 and April 2018”
2.C. During what time of the day, do the Brazilian customers mostly place their orders? (Dawn,
Morning, Afternoon or Night)
i) 0-6 hrs : Dawn ii) 7-12 hrs : Mornings iii) 13-18 hrs : Afternoon iv) 19-23 hrs : Night
Answer:
Insights:
From the given data set we see Brazilian customers mostly place their orders in Afternoon
then Night and Mornings. Very few only place their orders in Dawn.
3. Evolution of E-commerce orders in the Brazil region:
3.A. Get the month on month no. of orders placed in each state.
Answer:
Insights:
From the given data set we found the month on month orders placed in each state.
3.B. How are the customers distributed across all the states?
Answer:
Answer:
with final as
(select *, lead(total_cost) over(order by year ) as prev_year
from(select extract(year from order_purchase_timestamp) as year,
round(sum(payment_value),2) as total_cost
from `Target_SQL.orders` o
inner join `Target_SQL.payments` p
on p.order_id = o.order_id
where order_purchase_timestamp between '2017-01-01' and '2017-08-31' or
order_purchase_timestamp between '2018-01-01' and '2018-08-31'
group by 1)a
order by year)
Insights:
From the given data set we found that 138% increase in the cost of orders from year 2017 to
2018.
4.B. Calculate the Total & Average value of order price for each state.
Answer:
4.C. Calculate the Total & Average value of order freight for each state.
Answer:
Insights:
From the given data set we found Total Freight Price and Average Freight Price of product
for each state.
5. Analysis based on sales, freight and delivery time.
5.A. Find the no. of days taken to deliver each order from the order’s purchase date as delivery time.
Also, calculate the difference (in days) between the estimated & actual delivery date of an order.
Do this in a single query.
You can calculate the delivery time and the difference between the estimated & actual delivery date
using the given formula:
● time_to_deliver = order_delivered_customer_date -
order_purchase_timestamp
● diff_estimated_delivery = order_estimated_delivery_date -
order_delivered_customer_date
Answer:
Insights:
From the given data set we found Delivered time and Difference in estimated delivery time.
5.B. Find out the top 5 states with the highest & lowest average freight value.
Answer:
Insights:
From the given data set we found Five Highest Avg Freight State and Five Lowest Avg
Freight State.
5.C. Find out the top 5 states with the highest & lowest average delivery time.
Answer:
Insights:
From the given data set we found Five Highest Avg Time Delivery State and Five Lowest
Avg Time Delivery State.
5.D. Find out the top 5 states where the order delivery is really fast as compared to the estimated date
of delivery.
You can use the difference between the averages of actual & estimated delivery date to figure out how
fast the delivery was for each state.
Answer:
with final as
(select
c.customer_state,o.order_id,ord.order_estimated_delivery_date,ord.order_delivered_customer_
date,
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id )
select customer_state,
round(avg(date_diff(order_estimated_delivery_date, order_delivered_customer_date,day)),2)
as fast_delivery_state
from final
group by 1
order by 2 desc
limit 5
Insights:
From the given data set we found Five Fastest Delivery State.
Answer:
Insights:
From the given data set we found most of the purchase was done by Credit card only.
6.B. Find the no. of orders placed on the basis of the payment installments that have been paid.
Answer: