Target Data Analyst SQL Interview Questions 1737945171
Target Data Analyst SQL Interview Questions 1737945171
1. Identify the top 5 products that have shown the highest increase in weekly sales
over the last quarter.
Create a Table
product_id INT,
product_name VARCHAR(100),
week_start_date DATE,
sales INT
);
Insert some sample data representing weekly sales for different products over the last quarter.
VALUES
Calculate the increase in weekly sales and identify the top 5 products.
WITH weekly_difference AS (
SELECT
product_id,
product_name,
FROM
weekly_sales
WHERE
week_start_date >= '2024-09-01' AND week_start_date <= '2024-09-30' -- Filter last quarter
GROUP BY
product_id, product_name
SELECT
product_id,
product_name,
sales_increase
FROM
weekly_difference
ORDER BY
sales_increase DESC
LIMIT 5;
Explanation
1. Insert Data: The data is structured to capture weekly sales across different weeks for
multiple products.
3. Order and Limit Results: Order the results in descending order of sales_increase and
limit to the top 5.
Expected Output
For the provided data, the query identifies the top 5 products with the highest increase in sales.
This includes calculating sales trends for each product over the last quarter and selecting the
best performers.
2. Calculate the percentage of out-of-stock items for each store and identify the
stores that exceed 20% out-of-stock items on average .
Create a Table
Create a table named inventory_status to store inventory data for various stores.
store_id INT,
store_name VARCHAR(100),
item_id INT,
item_name VARCHAR(100),
);
VALUES
WITH out_of_stock_percentage AS (
SELECT
store_id,
store_name,
FROM
inventory_status
GROUP BY
store_id, store_name
SELECT
store_id,
store_name,
out_of_stock_pct
FROM
out_of_stock_percentage
WHERE
2. Calculate Percentage: Divide the count of out-of-stock items by the total count of
items in the store (COUNT(*)) and multiply by 100 to get the percentage.
3. Filter by Threshold: Use WHERE out_of_stock_pct > 20 to identify stores with more
than 20% out-of-stock items.
Expected Output
1 Store A 50.0
2 Store B 75.0
4 Store D 100.0
3.Find products that were consistently sold in every store across a region but saw
no sales in at least one store last month.
Create a Table
Create a table named store_sales to store sales data for various products in different stores.
store_id INT,
store_name VARCHAR(100),
product_id INT,
product_name VARCHAR(100),
sales INT,
sales_date DATE
);
Insert sample records to represent sales data across different stores and products.
Identify products that were consistently sold in all stores but saw no sales in at least one store
in the last month.
WITH all_stores_sales AS (
SELECT
product_id,
product_name,
FROM
store_sales
GROUP BY
product_id, product_name
HAVING
),
last_month_sales AS (
SELECT
product_id,
product_name,
store_id,
SUM(sales) AS total_sales
FROM
store_sales
WHERE
AND
GROUP BY
),
zero_sales AS (
SELECT
product_id,
product_name
FROM
last_month_sales
WHERE
total_sales = 0
GROUP BY
product_id, product_name
SELECT DISTINCT
a.product_id,
a.product_name
FROM
all_stores_sales a
JOIN
zero_sales z
ON
a.product_id = z.product_id;
Explanation
o Use WHERE to filter sales data for the previous month using CURRENT_DATE -
INTERVAL 1 MONTH.
o Group by product and store, summing the sales, and identify products with zero
sales.
o Join consistent sales data with zero-sales data to find products meeting both
conditions.
Expected Output
product_id product_name
102 Product Y
103 Product Z
customer_id INT,
customer_name VARCHAR(100),
purchase_date DATE,
purchase_amount DECIMAL(10, 2)
);
VALUES
SELECT
customer_id,
FROM
customer_purchases
WHERE
),
monthly_count AS (
SELECT
customer_id,
FROM
months_data
GROUP BY
customer_id
SELECT
customer_id
FROM
monthly_count
WHERE
months_purchased = 6;
Explanation
2. Extract Month-Year:
o Use DATE_FORMAT(purchase_date, '%Y-%m') to extract the month and year for
grouping.
o Count the number of distinct months each customer made purchases in using
COUNT(DISTINCT purchase_month).
o Retained customers are those who made purchases in exactly 6 distinct months
(WHERE months_purchased = 6).
Sample Output
customer_id
• Customer 1 (Alice) and Customer 3 (Charlie) made purchases every month for the last 6
months, indicating high retention.
5. Explain how indexing works in SQL and how you would use it to optimize a query
that involves multiple joins on a large dataset of store transactions.
How Indexing Works in SQL
Indexing in SQL is a performance optimization technique that speeds up data retrieval. An index
is a separate data structure that SQL databases maintain to allow faster lookups for specific
columns or combinations of columns. Think of it as a table of contents in a book—it helps you
quickly locate the page containing the information you need without scanning the entire book.
1. Structure: Most indexes are implemented as B-trees or hash tables, depending on the
database and type of index.
Example Scenario
SELECT
FROM
transactions t
JOIN
JOIN
WHERE
o The query joins on store_id and product_id. Index these columns because they
are used in the ON clause.
1. Efficient Lookup:
o Instead of performing a full table scan, the database uses the index to quickly
locate matching rows in the transactions, stores, and products tables.
o Indexing reduces disk reads, which are the most expensive part of query
execution in large datasets.
o When multiple tables are joined, indexes allow SQL to fetch matching rows from
each table faster.
1. Index Overhead:
o Indexes take additional disk space and slow down INSERT, UPDATE, and DELETE
operations because they need to update the indexes.
o Avoid indexing every column; focus on frequently queried and joined columns.
3. Index Maintenance:
o Periodically monitor and optimize indexes, especially for tables with high
transaction volumes.
Conclusion
Indexes can dramatically improve the performance of queries involving multiple joins on large
datasets. In the given example, indexing the join columns (store_id, product_id) and filtering
columns (transaction_date) enables faster lookups and efficient join processing, leading to
significant query optimization.
6. Discuss how you would manage and query a database containing billions of rows
of sales data across multiple time zones.
Managing and querying a database with billions of rows of sales data across multiple time
zones requires careful planning to ensure scalability, performance, and accuracy. Below is a
detailed discussion on strategies for managing such a database:
1. Database Design
a. Data Partitioning
b. Indexing
• Use DATETIME or TIMESTAMP with time zone support for date fields.
2. Storage Optimization
a. Data Compression
b. Archiving
Archive older data into cheaper storage (e.g., data from 5+ years ago):
• Use separate databases or file-based storage (e.g., AWS S3, Google Cloud Storage).
c. Data Warehousing
Offload historical data to a data warehouse (e.g., Snowflake, BigQuery) for advanced analytics.
3. Query Optimization
Store all DATETIME values in UTC and convert to local time zones during queries:
SELECT transaction_id,
FROM sales_data
b. Aggregate Data
SELECT store_id,
SUM(sales_amount) AS total_sales
FROM sales_data
4. Scalability
a. Horizontal Scaling
b. Vertical Scaling
• Upgrade hardware resources (e.g., CPU, memory, SSDs) for better performance.
c. Distributed Databases
• Use distributed systems like Google Bigtable, Cassandra, or Amazon DynamoDB for
high availability and scalability.
• ETL Tools: Use Apache Spark, Talend, or Airflow for efficient data processing and
loading.
• Database Monitoring: Use tools like Percona Monitoring and Management (PMM) or
AWS CloudWatch.
• Data Analytics Platforms: Use tools like Tableau or Power BI for visualization.
6. Real-World Example
Suppose you need to analyze sales trends in the last quarter for North America across multiple
time zones:
1. Write an optimized query that filters data by region, aggregates sales, and accounts for
time zones:
SELECT region,
SUM(sales_amount) AS total_sales
FROM sales_data
7. Conclusion
By following these strategies, you can ensure high performance, scalability, and accuracy in
handling large datasets across multiple time zones.
7. In the case of seasonal promotions, how would you design an SQL query to
measure the effectiveness of discounts on specific product categories?
To measure the effectiveness of discounts on specific product categories during seasonal
promotions, you can design an SQL query that compares sales performance metrics such as
total sales, units sold, or average revenue per product before and during the promotion period.
Here's a structured approach:
1. Baseline Performance:
2. Promotion Performance:
3. Effectiveness Metrics:
Sample Tables
-- Sample sales_data
VALUES
-- Sample products
VALUES
1. Compare sales and units sold before and during the promotion.
SQL Query
WITH baseline AS (
SELECT
category,
SUM(units_sold) AS total_units_before,
SUM(sale_amount) AS total_sales_before
FROM
sales_data
WHERE
AND discount_applied = 0
GROUP BY
category
),
promotion AS (
SELECT
category,
SUM(units_sold) AS total_units_during,
SUM(sale_amount) AS total_sales_during
FROM
sales_data
WHERE
GROUP BY
category
SELECT
p.category,
b.total_units_before,
p.total_units_during,
b.total_sales_before,
p.total_sales_during,
FROM
baseline b
JOIN
promotion p ON b.category = p.category;
Explanation
1. Baseline CTE:
o Calculates total units sold and sales amount before the promotion (non-
discounted period).
2. Promotion CTE:
o Calculates total units sold and sales amount during the promotion (discounted
period).
3. Final Query:
Step 4: Output
Electr
3 5 66.67 350 400 14.29
onics
Clothi
3 3 0.00 75 75 0.00
ng
1. Units Change:
2. Sales Change:
2. Product Segmentation:
3. Promotion ROI:
This approach provides actionable insights into the effectiveness of discounts, allowing for
data-driven promotional strategies.
8. Explain the difference between OLTP and OLAP databases, and provide
examples of how Target might use each for its operations.
The key difference between OLTP (Online Transaction Processing) and OLAP (Online Analytical
Processing) databases lies in their purpose, structure, and usage.
Purpose
Characteristics
4. Data Volume: Holds current transactional data with limited historical data.
• Order Management: Handles customer orders and tracks their fulfillment status.
Example Technology
Purpose
Characteristics
1. Data Model: Denormalized, often stored in a star or snowflake schema for faster
querying.
4. Data Volume: Large volumes of historical data, often integrated from multiple sources.
• Sales Analysis: Analyzing historical sales data to identify trends, seasonal patterns,
and top-selling products.
• Profitability Analysis: Comparing sales and operational costs across regions or stores
to assess profitability.
Example Technology
Comparison Table
Denormalized (Star/Snowflake
Data Model Normalized (3NF)
schema)
Example Use
Real-time inventory update Monthly sales trend analysis
Case
1. OLTP Example:
2. OLAP Example:
▪ Identify the most popular products in different regions during the holiday
season.
Conclusion
Target would rely on OLTP systems for operational efficiency and OLAP systems for strategic
decision-making. Together, they create a robust ecosystem that supports both transactional
integrity and insightful analytics.