0% found this document useful (0 votes)
20 views11 pages

Brazilian Retailer Case Study

The document outlines a comprehensive analysis of a dataset related to customer orders, including exploratory data analysis, trends in order placements, customer distribution across states, and financial impacts on the economy. Key insights reveal a growing trend in orders, seasonal patterns, and significant increases in order costs from 2017 to 2018. Additionally, it examines delivery times, freight values, and payment methods used by customers.

Uploaded by

pavithran s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views11 pages

Brazilian Retailer Case Study

The document outlines a comprehensive analysis of a dataset related to customer orders, including exploratory data analysis, trends in order placements, customer distribution across states, and financial impacts on the economy. Key insights reveal a growing trend in orders, seasonal patterns, and significant increases in order costs from 2017 to 2018. Additionally, it examines delivery times, freight values, and payment methods used by customers.

Uploaded by

pavithran s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

Import the dataset and do usual exploratory analysis steps like checking the structure &
characteristics of the dataset:

1.A: Data type of all columns in the “customers” table.

Answer:

SELECT column_name, data_type


FROM Target_SQL.INFORMATION_SCHEMA.COLUMNS
where table_name = 'customers'

Insights:
We see that customer_id, customer_unique_id, customer_city, customer_state are
“STRING” Data Type and customer_zip_code_prefix is of “INTEGER” Data Type.

1.B: Get the time range between which the orders were placed.

Answer:

select max(order_purchase_timestamp) as last_order,


min(order_purchase_timestamp) as first_order
from `Target_SQL.orders`

Insights:
From the data set, we see that the first order was made in 04.09.2016 and last order
was made in 17.10.2018.
1.C. Count the Cities & States of customers who ordered during the given period.

Answer:

select count(distinct customer_city) as customer_city,


count(distinct customer_state) as customer_state
from `Target_SQL.customers`

Insights:
Customers who ordered are from 4119 cities in 27 states.

2. In-depth Exploration:

2.A. Is there a growing trend in the no. of orders placed over the past years?

Answer:

select year,count(year) as no_of_orders


from( select *, extract(year from order_purchase_timestamp) as year
from `Target_SQL.orders`)a
group by year
order by year

Insights:
Yes, there a growing trend in the no. of orders placed over the past years.

2.B. Can we see some kind of monthly seasonality in terms of the no. of orders being placed?

Answer:

select month_name, month, year,count(month) as no_of_orders


from( select *, extract(month from order_purchase_timestamp) as month,
extract(year from order_purchase_timestamp) as year,
format_datetime('%b', order_purchase_timestamp) as month_name,
from `Target_SQL.orders`)a
group by 1,2,3
order by 2,3

Insights:
Yes, we can see some kind of monthly seasonality in “Jan 2018 and March 2018” and “Feb
2018 and April 2018”

2.C. During what time of the day, do the Brazilian customers mostly place their orders? (Dawn,
Morning, Afternoon or Night)
i) 0-6 hrs : Dawn ii) 7-12 hrs : Mornings iii) 13-18 hrs : Afternoon iv) 19-23 hrs : Night

Answer:

select purchase_time, count(purchase_time) as no_of_orders


from(select order_purchase_timestamp, case
when hour between 0 and 6 then 'Dawn'
when hour between 7 and 12 then 'Mornings'
when hour between 13 and 18 then 'Afternoon'
when hour between 19 and 23 then 'Night'
end as purchase_time
from(select order_purchase_timestamp, extract(hour from order_purchase_timestamp) as hour
from `Target_SQL.orders`)a)b
group by 1
order by 2 desc

Insights:
From the given data set we see Brazilian customers mostly place their orders in Afternoon
then Night and Mornings. Very few only place their orders in Dawn.
3. Evolution of E-commerce orders in the Brazil region:
3.A. Get the month on month no. of orders placed in each state.

Answer:

SELECT customer_state, extract (month from order_purchase_timestamp) as month,


count(order_id) as no_of_orders
from `Target_SQL.customers` c
inner join `Target_SQL.orders` o
on c.customer_id = o.customer_id
group by 1,2
order by 1,2

Insights:
From the given data set we found the month on month orders placed in each state.

3.B. How are the customers distributed across all the states?

Answer:

select customer_state, count(c.customer_id) as no_of_customers


from `Target_SQL.customers` c
inner join `Target_SQL.orders` o
on c.customer_id = o.customer_id
group by 1
order by 2 desc
Insights:
From the given data set we found that maximum number of customers from SP state.

4. Impact on Economy: Analyze the money movement by e-commerce by looking at order


prices, freight and others.
4.A. Get the % increase in the cost of orders from year 2017 to 2018 (include months between Jan to
Aug only).
You can use the “payment_value” column in the payments table to get the cost of orders.

Answer:

with final as
(select *, lead(total_cost) over(order by year ) as prev_year
from(select extract(year from order_purchase_timestamp) as year,
round(sum(payment_value),2) as total_cost
from `Target_SQL.orders` o
inner join `Target_SQL.payments` p
on p.order_id = o.order_id
where order_purchase_timestamp between '2017-01-01' and '2017-08-31' or
order_purchase_timestamp between '2018-01-01' and '2018-08-31'
group by 1)a
order by year)

select year_2017, year_2018, round(((year_2018 - year_2017)/ year_2017)*100,2) as


percentage
from(select sum(case when year = 2017 then total_cost end) as year_2017,
sum(case when year = 2018 then total_cost end) as year_2018
from final)b

Insights:
From the given data set we found that 138% increase in the cost of orders from year 2017 to
2018.

4.B. Calculate the Total & Average value of order price for each state.

Answer:

select customer_state, round(sum(price),2) as total_price, round(avg(price),2) as


average_price
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id
group by 1
order by 1
Insights:
From the given data set we found Total price and Average price of product for each state.

4.C. Calculate the Total & Average value of order freight for each state.

Answer:

select distinct customer_state,


round(sum(freight_value) over(partition by customer_state),2) as total_freight_price,
round(avg(freight_value) over(partition by customer_state),2) as average_freight_price
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id
order by 1

Insights:
From the given data set we found Total Freight Price and Average Freight Price of product
for each state.
5. Analysis based on sales, freight and delivery time.
5.A. Find the no. of days taken to deliver each order from the order’s purchase date as delivery time.
Also, calculate the difference (in days) between the estimated & actual delivery date of an order.
Do this in a single query.
You can calculate the delivery time and the difference between the estimated & actual delivery date
using the given formula:
● time_to_deliver = order_delivered_customer_date -
order_purchase_timestamp
● diff_estimated_delivery = order_estimated_delivery_date -
order_delivered_customer_date

Answer:

select timestamp_diff(order_delivered_customer_date, order_purchase_timestamp, day) as


time_to_deliver,
timestamp_diff(order_estimated_delivery_date, order_delivered_customer_date, day)
as diff_estimated_delivery
from `Target_SQL.orders`

Insights:
From the given data set we found Delivered time and Difference in estimated delivery time.

5.B. Find out the top 5 states with the highest & lowest average freight value.

Answer:

select (a.customer_state) as highest_avg_freight_state, a.highest_avg_freight_value,


(b.customer_state) as lowest_avg_freight_state, b.lowest_avg_freight_value
from (select customer_state, round(avg(freight_value),2) as highest_avg_freight_value,
row_number() over(order by round(avg(freight_value),2) asc) as rnk
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id
group by 1
order by highest_avg_freight_value desc
limit 5) a
inner join
(select customer_state, round(avg(freight_value),2) as lowest_avg_freight_value,
row_number() over(order by round(avg(freight_value),2) desc) rnk
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id
group by 1
order by lowest_avg_freight_value
limit 5) b
on a.rnk = b.rnk

Insights:
From the given data set we found Five Highest Avg Freight State and Five Lowest Avg
Freight State.

5.C. Find out the top 5 states with the highest & lowest average delivery time.

Answer:

select (a.customer_state) as highest_avg_time_deliver_state,


a.highest_average_time_deliver,
(b.customer_state) as lowest_avg_time_deliver_state, b.lowest_average_time_deliver
from(select customer_state, round(avg(time_to_deliver),2) as highest_average_time_deliver,
row_number() over(order by round(avg(time_to_deliver),2) desc) as rnk
from(select customer_state, timestamp_diff(order_delivered_customer_date,
order_purchase_timestamp, day) as time_to_deliver
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id)a
group by 1
order by 2 desc
limit 5)a
inner join
(select customer_state, round(avg(time_to_deliver),2) as lowest_average_time_deliver,
row_number() over(order by round(avg(time_to_deliver),2) asc) as rnk
from(select customer_state, timestamp_diff(order_delivered_customer_date,
order_purchase_timestamp, day) as time_to_deliver
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id)a
group by 1
order by 2
limit 5)b
on a.rnk =b.rnk

Insights:
From the given data set we found Five Highest Avg Time Delivery State and Five Lowest
Avg Time Delivery State.

5.D. Find out the top 5 states where the order delivery is really fast as compared to the estimated date
of delivery.
You can use the difference between the averages of actual & estimated delivery date to figure out how
fast the delivery was for each state.

Answer:

with final as
(select
c.customer_state,o.order_id,ord.order_estimated_delivery_date,ord.order_delivered_customer_
date,
from `Target_SQL.customers` c
inner join `Target_SQL.orders` ord
on ord.customer_id = c.customer_id
inner join `Target_SQL.order_items` o
on o.order_id = ord.order_id )

select customer_state,
round(avg(date_diff(order_estimated_delivery_date, order_delivered_customer_date,day)),2)
as fast_delivery_state
from final
group by 1
order by 2 desc
limit 5
Insights:
From the given data set we found Five Fastest Delivery State.

6. Analysis based on the payments:


6.A. Find the month on month no. of orders placed using different payment types.

Answer:

SELECT p.payment_type, extract (month from order_purchase_timestamp) as month,


extract (year from order_purchase_timestamp) as year, count(o.order_id) as no_of_orders
from `Target_SQL.orders` o
inner join `Target_SQL.payments` p
on o.order_id = p.order_id
group by 1,2,3
order by 2

Insights:
From the given data set we found most of the purchase was done by Credit card only.

6.B. Find the no. of orders placed on the basis of the payment installments that have been paid.

Answer:

select payment_installments, count(o.order_id) as no_of_orders


from `Target_SQL.orders` o
inner join `Target_SQL.payments` p
on o.order_id = p.order_id
where payment_installments >0
group by 1
order by 1
Insights:
From the given data set we found the installments that have been paid.

You might also like