SQL Retail Sales Project
SQL Retail Sales Project
Objectives
1. Set up a retail sales database: Create and populate a retail sales database with
the sales data provided.
2. Data Cleaning: Identify and remove any records with missing or null values.
3. Exploratory Data Analysis (EDA): Perform basic exploratory data analysis to
understand the dataset.
4. Business Analysis: Use SQL to answer specific business questions and derive
insights from the sales data.
Project Structure
1. Database Setup
• Database Creation: The project starts by creating a database named
p1_retail_db.
• Table Creation: A table named retail_sales is created to store the sales data.
The table structure includes columns for transaction ID, sale date, sale time,
customer ID, gender, age, product category, quantity sold, price per unit, cost of
goods sold (COGS), and total sale amount.
The following SQL queries were developed to answer specific business questions:
1. Write a SQL query to retrieve all columns for sales made on '2022-11-05:
SELECT *
FROM retail_sales
WHERE sale_date = '2022-11-05';
3. Write a SQL query to calculate the total sales (total_sale) for each category.:
SELECT
category,
SUM(total_sale) as net_sale,
COUNT(*) as total_orders
FROM retail_sales
GROUP BY 1
4. Write a SQL query to find the average age of customers who purchased
items from the 'Beauty' category.:
SELECT
ROUND(AVG(age), 2) as avg_age
FROM retail_sales
WHERE category = 'Beauty'
5. Write a SQL query to find all transactions where the total_sale is greater
than 1000.:
SELECT * FROM retail_sales
WHERE total_sale > 1000
7. Write a SQL query to calculate the average sale for each month. Find out
best selling month in each year:
SELECT
year,
month,
avg_sale
FROM
(
SELECT
EXTRACT(YEAR FROM sale_date) as year,
EXTRACT(MONTH FROM sale_date) as month,
AVG(total_sale) as avg_sale,
RANK() OVER(PARTITION BY EXTRACT(YEAR FROM sale_date) ORDER BY AVG(total_sale)
DESC) as rank
FROM retail_sales
GROUP BY 1, 2
) as t1
WHERE rank = 1
8. Write a SQL query to find the top 5 customers based on the highest total sales :
SELECT
customer_id,
SUM(total_sale) as total_sales
FROM retail_sales
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
9. Write a SQL query to find the number of unique customers who purchased
items from each category.:
SELECT
category,
COUNT(DISTINCT customer_id) as cnt_unique_cs
FROM retail_sales
GROUP BY category
10. Write a SQL query to create each shift and number of orders (Example
Morning <12, Afternoon Between 12 & 17, Evening >17):
WITH hourly_sale
AS
(
SELECT *,
CASE
WHEN EXTRACT(HOUR FROM sale_time) < 12 THEN 'Morning'
WHEN EXTRACT(HOUR FROM sale_time) BETWEEN 12 AND 17 THEN 'Afternoon'
ELSE 'Evening'
END as shift
FROM retail_sales
)
SELECT
shift,
COUNT(*) as total_orders
FROM hourly_sale
GROUP BY shift
Conclusion
This project serves as a comprehensive introduction to SQL for data analysts, covering
database setup, data cleaning, exploratory data analysis, and business-driven SQL
queries. The findings from this project can help drive business decisions by
understanding sales patterns, customer behavior, and product performance.