Interview Preparation Data Collection-01
Interview Preparation Data Collection-01
SQL Questions
Q.2 Explain the difference between INNER JOIN and OUTER JOIN with
examples?
Ans. 1. INNER JOIN: Returns only matching records from both tables. SELECT e.name,
d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id;
• Output: Only employees who belong to a department.
•
2. LEFT OUTER JOIN:
Returns all records from the left table, and matching records from the right table. If no match,
NULL is returned.
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;
• Output: All employees, with department info where available.
•
3. RIGHT OUTER JOIN:
Returns all records from the right table, and matching records from the left.
FULL OUTER JOIN: Returns all records from both tables, matching where possible.
Key Difference:
• INNER JOIN = intersection (matched data only)
• OUTER JOIN = union + NULLs (matched + unmatched data)
Q.3 Write a query to fetch the second-highest salary from an employee table?
Ans. Option 1: Using DISTINCT, ORDER BY, and LIMIT (MySQL/PostgreSQL)
SELECT DISTINCT
salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
Option 2: Using subquery (Generic SQL)
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
Explanation:
• The subquery fetches the highest salary.
• The outer query finds the maximum salary less than the highest — giving the second- highest.
Q.4 How do you use GROUP BY and HAVING together? Provide an example?
Ans. Use GROUP BY to group data and HAVING to filter aggregated results (unlike WHERE, which
filters raw rows).
SELECT department_id, COUNT(*) AS emp_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
Explanation:
• Groups employees by department.
• Filters groups where the count of employees is more than 5.
Ans. Definition: A window function performs calculations across a set of table rows related to the current row
— without collapsing rows like GROUP BY.
Syntax:
FUNCTION_NAME() OVER (PARTITION BY column ORDER BY column)
Example: ROW_NUMBER()
Assigns a unique sequential number to each row within a partition. SELECT
name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM
employees;
• Each employee within the same department gets a row number based on salary rank (highest
first).
Example: RANK()
Assigns the same rank to rows with equal values, but skips the next rank(s). SELECT
name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_num FROM
employees;
• If 2 employees have the same salary, both get rank 1, and the next gets rank 3.
Q.7 Write a query to fetch the top 3 performing products based on sales?
Ans. Assume table sales_data has:
product_id, product_name, total_sales
SELECT product_id, product_name, total_sales
FROM sales_data
ORDER BY total_sales DESC
LIMIT 3;
Alternate using RANK() (if ties matter):
Example:
SELECT city FROM customers
UNION
SELECT city FROM vendors;
→ Returns a unique list of cities.
SELECT city FROM customers
UNION ALL
SELECT city FROM vendors;
→ Returns all cities, including duplicates.
Q.9 How do you use a CASE statement in SQL? Provide an example?
Ans. CASE lets you write conditional logic in SQL (similar to IF/ELSE).
SELECT name, salary,
CASE
WHEN salary >= 100000 THEN 'High'
WHEN salary >= 50000 THEN 'Medium' ELSE 'Low'
END AS salary_category FROM
employees; Explanation:
• Assigns a category based on salary value.
• Works inside SELECT, WHERE, ORDER BY, etc.
• SUM(...) OVER (...) calculates a running total per product based on order date.
• PARTITION BY groups by product, and ORDER BY ensures the accumulation follows
chronological order.
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;
Q.16 Write a query to identify customers who have made transactions above
$5,000 multiple times?
Ans. Assume transactions table has:
customer_id, transaction_amount
SELECT customer_id, COUNT(*) AS high_value_txns
FROM transactions
WHERE transaction_amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;
Explanation:
• Filters high-value transactions (> $5000).
• Groups them by customer.
• Returns customers who’ve done this more than once.
Q.19 Write a query to find all customers who have not made any purchases in the last 6
months?
Ans. Assume:
• customers(customer_id, name)
transactions(customer_id, transaction_date)
4. Conditional checks:
SELECT name,
CASE
WHEN salary IS NULL THEN 'Unknown' ELSE
'Known'
END AS salary_status FROM
employees;
Example:
-- Creating index
CREATE INDEX idx_customer_id ON transactions(customer_id);
• This helps queries like:
SELECT * FROM transactions WHERE customer_id = 101;
Important notes:
• Too many indexes can slow down INSERT/UPDATE.
• Avoid indexing columns with low cardinality (e.g., gender).
• Use composite indexes when querying multiple columns together.
Q.23 Write a query to fetch the maximum transaction amount for each customer.
Ans. Assume a transactions table:
Column Description
customer_id ID of the customer
transaction_id Unique transaction ID
amount Transaction amount
Query:
SELECT customer_id, MAX(amount) AS max_transaction
FROM transactions
GROUP BY customer_id;
Explanation:
• GROUP BY groups all transactions by customer.
• MAX(amount) returns the highest transaction for each group (customer).
In short:
• OLTP = operational, fast, real-time transactions.
• OLAP = analytical, slow-changing, historical data.
Q.5 How would you determine the Average Revenue Per User (ARPU)
from transaction data?
Ans. ARPU = Total Revenue / Total Number of Users Assume a
transactions table:
(transaction_id, customer_id, amount, transaction_date)
SQL Query:
SELECT
SUM(amount) * 1.0 / COUNT(DISTINCT customer_id) AS ARPU
FROM transactions;
Explanation:
• SUM(amount) gets total revenue.
• COUNT(DISTINCT customer_id) counts unique users.
• Multiply by 1.0 to ensure float division.
You can also compute monthly ARPU by grouping by month. SELECT
DATE_TRUNC('month', transaction_date) AS month,
SUM(amount) * 1.0 / COUNT(DISTINCT customer_id) AS monthly_arpu
FROM transactions
GROUP BY month
ORDER BY
month;
Q.6 Describe a scenario where you would use a LEFT JOIN instead of an
INNER JOIN?
Ans. Use LEFT JOIN when: You want all records from the left table, even if there's no matching record in the
right table.
Real-life Scenario: Question: List all customers and their transactions — even if they haven't made any.
Query: SELECT c.customer_id, c.name, t.transaction_id, t.amount FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id;
Why LEFT JOIN?
• Shows all customers, including those with no transactions (returns NULLs for those).
• Using INNER JOIN would exclude customers with zero activity.
Q.7 Write a query to calculate YoY (Year-over-Year) growth for a set of
transactions?
Ans. Assume a table named transactions with:
(customer_id, transaction_date, amount)
Step 1: Extract year-wise revenue
SELECT
EXTRACT(YEAR FROM transaction_date) AS year,
SUM(amount) AS total_revenue
FROM transactions
GROUP BY EXTRACT(YEAR FROM transaction_date);
Step 2: Calculate YoY Growth using a CTE and Self-Join
WITH yearly_revenue AS (
SELECT
EXTRACT(YEAR FROM transaction_date) AS year,
SUM(amount) AS total_revenue
FROM transactions
GROUP BY EXTRACT(YEAR FROM transaction_date)
)
SELEC
T
curr.year AS current_year,
curr.total_revenue,
prev.total_revenue AS previous_year_revenue,
ROUND(((curr.total_revenue - prev.total_revenue) / prev.total_revenue) * 100, 2) AS yoy_growth_percent
FROM yearly_revenue curr
LEFT JOIN yearly_revenue prev
ON curr.year = prev.year + 1;
Explanation:
• Joins each year to its previous year.
• Computes YoY growth as a percentage.
Q.9 Write a query to find customers who have used more than 2 credit
cards for transactions in a given month?
Ans. Assume a transactions table:
(customer_id, card_id, transaction_date)
Query:
SELECT customer_id,
TO_CHAR(transaction_date, 'YYYY-MM') AS txn_month, COUNT(DISTINCT
card_id) AS cards_used
FROM transactions
GROUP BY customer_id, TO_CHAR(transaction_date, 'YYYY-MM')
HAVING COUNT(DISTINCT card_id) > 2;
Explanation:
• Groups by customer_id and month.
• Counts distinct card_id used.
• Filters where more than 2 cards were used in a month.
Q.10 How would you approach a business problem where you need to analyze the
spending patterns of premium customers?
Ans. Step-by-Step Structured Approach:
Step 1: Understand the Objective
• Clarify with stakeholders what "spending pattern" means. o Is it
frequency, amount, category, channel, or timing?
• Define “premium customer”:
Based on credit score, card tier (e.g., Platinum, Centurion), monthly spend threshold, etc.
Step 2: Data Collection
• Gather relevant datasets:
o Customer table (ID, tier, demographics)
o Transactions table (amount, date, category, location)
o Cards table (card_type, limits, activation)
Step 5: Segmentation
• Use clustering or thresholds to group premium customers into:
o High spenders
o Frequent spenders
o Category loyalists (e.g., only travel)
• Identify anomalies or subgroups with unique patterns.