0% found this document useful (0 votes)
3 views

Target SQL - Reference

Uploaded by

Gaurav 02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Target SQL - Reference

Uploaded by

Gaurav 02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Target SQL Business Case

❖ Topic: SQL
❖ Duration: 1 week
____________________________________________________________________________

Why this case study?

From company’s perspective:


● Target is a globally renowned brand and a prominent retailer in the United States.
Target makes itself a preferred shopping destination by offering outstanding value,
inspiration, innovation and an exceptional guest experience that no other retailer can
deliver.
● This particular business case focuses on the operations of Target in Brazil and provides
insightful information about 100,000 orders placed between 2016 and 2018. The
dataset offers a comprehensive view of various dimensions including the order status,
price, payment and freight performance, customer location, product attributes, and
customer reviews.
● By analyzing this extensive dataset, it becomes possible to gain valuable insights into
Target's operations in Brazil. The information can shed light on various aspects of the
business, such as order processing, pricing strategies, payment and shipping efficiency,
customer demographics, product characteristics, and customer satisfaction levels.

From learner’s perspective:


● Solving this business case holds immense importance for aspiring data analysts and
scientists.
● SQL is the backbone of data analysis and management in the modern era. By working
through this case study, individuals gain hands-on experience and practical skills in
manipulating and querying databases, which are fundamental to extracting insights
from large volumes of data.
● Solving this particular case study will help in developing proficiency in constructing
complex SQL queries and understanding real-world data scenarios.
● Additionally, it will enhance one's ability to communicate with the stakeholders
involved in data-related projects and help the organization take better, data-driven
decisions.
____________________________________________________________________________
Dataset:
https://drive.google.com/drive/folders/1TGEc66YKbD443nslRi1bWgVd238gJCnb?usp=sharing

The data is available in 8 different csv files:


1. customers.csv
2. geolocation.csv
3. order_items.csv
4. payments.csv
5. reviews.csv
6. orders.csv
7. products.csv
8. sellers.csv

The column description for these csv files is given below.

The customers.csv contain following features:

Features Description

customer_id ID of the consumer who made the purchase

customer_unique_id Unique ID of the consumer

customer_zip_code_prefix Zip Code of consumer’s location

customer_city Name of the City from where order is made

customer_state State Code from where order is made (Eg. são paulo - SP)

The geolocations.csv contain following features:

Features Description

geolocation_zip_code_prefix First 5 digits of Zip Code

geolocation_lat Latitude

geolocation_lng Longitude

geolocation_city City

geolocation_state State
The sellers.csv contains following features:

Features Description
Unique ID of the seller registered
seller_id

seller_zip_code_prefix Zip Code of the seller’s location

seller_city Name of the City of the seller

seller_state State Code (Eg. são paulo - SP)

The order_items.csv contain following features:

Features Description

order_id A Unique ID of order made by the consumers

order_item_id A Unique ID given to each item ordered in


the order

product_id A Unique ID given to each product available


on the site

seller_id Unique ID of the seller registered in Target

shipping_limit_date The date before which shipping of the


ordered product must be completed

price Actual price of the products ordered

freight_value Price rate at which a product is delivered


from one point to another

The payments.csv contain following features:

Features Description

order_id A Unique ID of order made by the


consumers.
payment_sequential Sequences of the payments made in case of
EMI

payment_type Mode of payment used (Eg. Credit Card)

payment_installments Number of installments in case of EMI


purchase.

payment_value Total amount paid for the purchase order

The orders.csv contain following features:

Features Description

order_id A Unique ID of order made by the consumers

customer_id ID of the consumer who made the purchase

order_status Status of the order made i.e. delivered,


shipped, etc.

order_purchase_timestamp Timestamp of the purchase

order_delivered_carrier_date Delivery date at which carrier made the


delivery

order_delivered_customer_date Date at which customer got the product

order_estimated_delivery_date Estimated delivery date of the products

The reviews.csv contain following features:

Features Description

review_id ID of the review given on the product


ordered by the order id

order_id A Unique ID of order made by the consumers

review_score Review score given by the customer for each


order on a scale of 1-5

review_comment_title Title of the review

review_comment_message Review comments posted by the consumer


for each order
review_creation_date Timestamp of the review when it is created

review_answer_timestamp Timestamp of the review answered

The products.csv contain following features:

Features Description

product_id A Unique identifier for the proposed project.

product_category_name Name of the product category

product_name_lenght Length of the string which specifies the name


given to the products ordered

product_description_lenght Length of the description written for each


product ordered on the site.

product_photos_qty Number of photos of each product ordered


available on the shopping portal

product_weight_g Weight of the products ordered in grams

product_length_cm Length of the products ordered in


centimeters

product_height_cm Height of the products ordered in


centimeters

product_width_cm Width of the product ordered in centimeters

Dataset schema:
____________________________________________________________________________

How to get started?

To complete the case study, begin by downloading the CSV files from the provided link.
Afterward, proceed to upload them onto BigQuery for further analysis.

Assuming you have already established a project called "Ecommerce" in your SQL workspace,
the next step is to create a new dataset named "target."

Now, systematically upload each CSV file as a separate BigQuery table within the "target"
dataset.

Once all the files have been successfully uploaded, you can conveniently access them within
the query editor by referencing the dataset and table names.

For example,
● To access the "customers" table, you would use the reference `target.customers`.
● Similarly, for the "orders" table, you would use `target.orders`, and so on.
____________________________________________________________________________
What is expected?
Assuming you are a data analyst/ scientist at Target, you have been assigned the task of
analyzing the given dataset to extract valuable insights and provide actionable
recommendations.

Submission Process:
Once you’re done with the case study...

● Use a Word document to paste your SQL queries along with a screenshot of the first
10 rows from the output.
● List down any valuable insights that you find during the analysis and provide some
action items from the company’s perspective in order to improve the current situation.
● Convert your solutions doc into a PDF, and upload the same on the platform.
● Please note that after submitting once, you will not be allowed to edit your submission.

General Guidelines:
Evaluation will be kept lenient, so make sure you attempt this case study.

It is understandable that you might struggle with getting started on this or feel stuck at

some point.
In such case:
a. Read the question carefully and try to understand what exactly is being asked.
b. Brainstorm a little. If you’re getting an error, remember that Google is your best
friend.
c. You can watch the lecture recordings or go through your lecture notes once
again if you feel like you’re getting confused over some specific topics.
d. Discuss your problems with your peers. Make use of the Slack channel and
WhatsApp group.
e. Only if you think that there’s a major issue, you can reach out to your Instructor
via Slack or Email.
____________________________________________________________________________

What does ‘good’ look like?

I. Import the dataset and do usual exploratory analysis steps like checking the
structure & characteristics of the dataset.
A. Data type of all columns in the “customers” table.

Hint: We want you to display the data type of each column present in the
“customers” table.
B. Get the time range between which the orders were placed.

Hint: We want you to get the date & time when the first and last orders in our
dataset were placed.

C. Count the Cities & States of customers who ordered during the given period.

Hint: We want you to count the number of unique cities & states where orders
were placed by the customers during the given time period.
______________________________________________________________________________

II. In-depth Exploration:


A. Is there a growing trend in the no. of orders placed over the past years?

Hint: We want you to find out if no. of orders placed has increased gradually in
each month, over the past years.

B. Can we see some kind of monthly seasonality in terms of the no. of orders being
placed?

Hint: We want you to find out if the no. of orders placed are at peak during
certain months.

C. During what time of the day, do the Brazilian customers mostly place their
orders? (Dawn, Morning, Afternoon or Night)
● 0-6 hrs : Dawn
● 7-12 hrs : Mornings
● 13-18 hrs : Afternoon
● 19-23 hrs : Night

Hint: We want you to categorize the hours of a day into the given time brackets/
intervals and find out during which intervals the Brazilian customers usually
order the most.
______________________________________________________________________________

III. Evolution of E-commerce orders in the Brazil region:


A. Get the month on month no. of orders placed in each state.

Hint: We want you to get the no. of orders placed in each state, in each month
by our customers.

B. How are the customers distributed across all the states?


Hint: We want you to get the no. of unique customers present in each state.
______________________________________________________________________________

IV. Impact on Economy: Analyze the money movement by e-commerce by looking at


order prices, freight and others.
A. Get the % increase in the cost of orders from year 2017 to 2018 (include
months between Jan to Aug only).

Hint: You can use the payment_value column in the payments table to get the
cost of orders.

B. Calculate the Total & Average value of order price for each state.

Hint: We want you to fetch the total price and the average price of orders for
each state.

C. Calculate the Total & Average value of order freight for each state.

Hint: We want you to fetch the total freight value and the average freight value
of orders for each state.
_____________________________________________________________________________________

V. Analysis based on sales, freight and delivery time.


A. Find the no. of days taken to deliver each order from the order’s purchase date
as delivery time.
Also, calculate the difference (in days) between the estimated & actual delivery
date of an order.
Do this in a single query.

Hint: You can calculate the delivery time and the difference between the
estimated & actual delivery date using the given formula:
● time_to_deliver = order_delivered_customer_date -
order_purchase_timestamp
● diff_estimated_delivery = order_estimated_delivery_date -
order_delivered_customer_date

B. Find out the top 5 states with the highest & lowest average freight value.
Hint: We want you to find the top 5 & the bottom 5 states arranged in
increasing order of the average freight value.

C. Find out the top 5 states with the highest & lowest average delivery time.

Hint: We want you to find the top 5 & the bottom 5 states arranged in
increasing order of the average delivery time.

D. Find out the top 5 states where the order delivery is really fast as compared to
the estimated date of delivery.

You can use the difference between the averages of actual & estimated delivery
date to figure out how fast the delivery was for each state.

Hint: Include only the orders that are already delivered.


______________________________________________________________________________

VI. Analysis based on the payments:


A. Find the month on month no. of orders placed using different payment types.

Hint: We want you to count the no. of orders placed using different payment
methods in each month over the past years.

B. Find the no. of orders placed on the basis of the payment installments that have
been paid.

Hint: We want you to count the no. of orders placed based on the no. of
payment installments where at least one installment has been successfully
paid.
____________________________________________________________________________

FAQs
Q. Which platform am I supposed to use for writing queries?
You are advised to use BigQuery as the platform for solving this case study.

Q. I am having issues setting up BigQuery .


For any query related to BigQuery setup, you can refer to this doc - link
Q. How do I get data types of columns in BigQuery?
We can get the data type of a column using the BigQuery INFORMATION_SCHEMA.

Q. What do you mean by month on month in the questions?


Month on Month means for each month starting from 1 (Jan) to 12 (Dec) in natural order.

Q. Do I need to find the time difference or the date difference?


You have to find the difference between the given timestamps in days.

You might also like