0% found this document useful (0 votes)

5 views

PwC Data Analyst Interview

The document outlines methods for estimating smartphone sales and daily revenue from roadside tea stalls in India, using logical steps and assumptions based on population data. It also includes Python programming tasks such as finding unique pairs that sum to a target, checking for palindromes, and explaining deep vs shallow copy. Additionally, it covers SQL queries for cumulative revenue, top products by sales volume, identifying customers with consecutive purchases, and calculating user retention rates.

Uploaded by

ronit.kumar2802

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

PwC Data Analyst Interview

Uploaded by

ronit.kumar2802

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

PWC DATA ANALYST EXPERIENCE (1-3 yoe)

Guesstimate Questions:
1. Estimate the number of smartphones sold in India annually.

To guesstimate the annual number of smartphones sold in India, we can break the problem
into logical steps using assumptions and available population data. Here's a structured
approach:

Step 1: Population of India

India's population is approximately 1.4 billion people.

Step 2: Target Population (Smartphone Users)

Not everyone in India uses or purchases a smartphone. Let's segment the population:

• Assume 70% of the population is in the age group 15-60, which is the primary
smartphone user base.
1.4billion×70

• Assume 70% of this group can afford a smartphone or actively use one.
980million×70

Step 3: Replacement Cycle and New Users

• On average, a smartphone is replaced every 2-3 years. This implies that

approximately one-third of smartphone users replace their phones annually.
686million÷3≈229million

• Assume an additional 5-10% of the target population (new users) buys a

smartphone each year. Using 10%: 686million×10

Step 4: Total Annual Sales

Adding the replacements and new users:

229million+68.6million≈300million

Final Estimate:

Approximately 300 million smartphones are sold annually in India.

Assumptions Recap:
1. 70% of the population is in the primary age group for smartphone users.

2. 70% of this segment can afford smartphones.

3. Replacement cycle is 2-3 years.

4. Annual new users account for ~10% of the total target population.

Refinement:

This estimate can be refined further with real data on:

• Smartphone penetration rates.

• Market growth rates.

• Urban vs. rural demographics.

2. How would you estimate the daily revenue generated by roadside tea stalls across
India?

To estimate the daily revenue generated by roadside tea stalls across India, let's approach
this systematically using assumptions and logical reasoning:

Step 1: Population of India

India's population is approximately 1.4 billion.

Step 2: Target Population for Tea Consumption

Not everyone consumes tea from roadside stalls. Let’s segment the population:

• Assume 70% of the population (adults and older teens) regularly drink tea.
1.4billion×70

• Out of these, assume 50% of tea drinkers consume tea from roadside stalls (the rest
may make tea at home, go to cafes, or other sources).
980million×50

Step 3: Daily Tea Consumption

• On average, tea drinkers consume 2 cups of tea daily.

• Not all cups are purchased from roadside stalls; assume 1 cup per person per day
is bought from such stalls. 490millioncupsperday490 million cups per
day490millioncupsperday

Step 4: Price of Tea

• The average price of tea at roadside stalls is approximately ₹10 per cup.

Step 5: Daily Revenue

• Multiply the daily consumption by the price per cup:

490millioncups×₹10=₹4.9billion

Final Estimate:

The daily revenue generated by roadside tea stalls across India is approximately ₹4.9
billion.

Assumptions Recap:

1. 70% of the population drinks tea.

2. 50% of tea drinkers buy from roadside stalls.

3. One cup per person is consumed daily at roadside stalls.

4. Average price per cup is ₹10.

Refinement:

To improve this estimate:

• Factor in rural vs. urban consumption patterns (higher urban roadside tea stall
density).

• Adjust for regional variations in tea prices and consumption habits.

• Account for occasional tea drinkers or seasonal demand changes.

Python Questions:
1. Write a Python function to find all unique pairs of integers in a list that sum up to a given
target value.

Find All Unique Pairs That Sum to a Target

def find_pairs(nums, target):

seen = set()

pairs = set()

for num in nums:

complement = target - num

if complement in seen:

pairs.add((min(num, complement), max(num, complement)))

seen.add(num)

return list(pairs)

# Example usage:

nums = [2, 4, 3, 7, 5, 8, -1]

target = 7

print(find_pairs(nums, target)) # Output: [(3, 4), (2, 5)]

2. Given a string, write a function to check if it’s a palindrome, ignoring spaces,

punctuation, and case sensitivity.

Check if a String Is a Palindrome

import string
def is_palindrome(s):

# Remove spaces, punctuation, and convert to lowercase

filtered = ''.join(c for c in s if c.isalnum()).lower()

return filtered == filtered[::-1]

# Example usage:

s = "A man, a plan, a canal, Panama!"

print(is_palindrome(s)) # Output: True

3. Explain the difference between deep copy and shallow copy in Python. When would you
use each?

Deep Copy vs. Shallow Copy

• Shallow Copy:

o Creates a new object but does not create copies of nested objects.

o Changes to mutable objects within the original will reflect in the copied
object.

o Example: Using copy.copy() or the copy() method of a list.

• Deep Copy:

o Creates a new object along with copies of all objects it contains, recursively.

o Changes to the original object do not affect the copied object.

o Example: Using copy.deepcopy().

Example:

import copy

original = [[1, 2], [3, 4]]

shallow = copy.copy(original)
deep = copy.deepcopy(original)

original[0][0] = 99

print(shallow) # Output: [[99, 2], [3, 4]]

print(deep) # Output: [[1, 2], [3, 4]]

Use Cases:

• Use shallow copy when you want to duplicate a structure but allow shared mutable
data.

• Use deep copy when creating a fully independent copy is necessary.

4.What are decorators in Python, and how do they work? Provide an example of a scenario
where a decorator would be useful.

Decorators in Python

Decorators are functions that modify the behavior of other functions or methods. They take
a function as input, add functionality to it, and return it.

Example of a Decorator:

def logger(func):

def wrapper(*args, **kwargs):

print(f"Calling {func.name} with {args} and {kwargs}")

result = func(*args, **kwargs)

print(f"{func.name} returned {result}")

return result

return wrapper

@logger
def add(a, b):

return a + b

# Example usage:

print(add(3, 5))

Output:

csharp

Copy code

Calling add with (3, 5) and {}

add returned 8

When to Use:

Decorators are useful for:

1. Logging: Automatically log function calls.

2. Authentication: Check user permissions before executing a function.

Caching: Store results of expensive computations for reuse .

SQL Questions:
1. Write a query to find the cumulative revenue by month for each product category in
a sales table.

Step 1: Create the Sales Table

CREATE TABLE sales (

id INT AUTO_INCREMENT PRIMARY KEY,

product_category VARCHAR(50),
revenue DECIMAL(10, 2),

sale_date DATE

);

Step 2: Insert Sample Records

INSERT INTO sales (product_category, revenue, sale_date) VALUES

('Electronics', 5000.00, '2024-01-15'),

('Electronics', 7000.00, '2024-02-10'),

('Electronics', 4000.00, '2024-03-05'),

('Clothing', 2000.00, '2024-01-20'),

('Clothing', 3000.00, '2024-02-15'),

('Clothing', 1500.00, '2024-03-01'),

('Groceries', 1000.00, '2024-01-10'),

('Groceries', 1200.00, '2024-02-12'),

('Groceries', 1300.00, '2024-03-08');

Step 3: Write the Query for Cumulative Revenue

SELECT

product_category,

DATE_FORMAT(sale_date, '%Y-%m') AS month,

SUM(revenue) AS monthly_revenue,

SUM(SUM(revenue)) OVER (PARTITION BY product_category ORDER BY

DATE_FORMAT(sale_date, '%Y-%m')) AS cumulative_revenue

FROM

sales

GROUP BY
product_category, DATE_FORMAT(sale_date, '%Y-%m')

ORDER BY

product_category, month;

Explanation:

1. DATE_FORMAT(sale_date, '%Y-%m'): Extracts the year and month from the

sale_date for grouping.

2. SUM(SUM(revenue)) OVER (PARTITION BY product_category ORDER BY

DATE_FORMAT(sale_date, '%Y-%m')): Calculates the cumulative revenue for each
product category by summing the monthly revenues in the specified order.

3. GROUP BY product_category, DATE_FORMAT(sale_date, '%Y-%m'): Groups the

data by product category and month.

Sample Output:

Product_Category Month Monthly_Revenue Cumulative_Revenue

Electronics 2024-01 5000.00 5000.00

Electronics 2024-02 7000.00 12000.00

Electronics 2024-03 4000.00 16000.00

Clothing 2024-01 2000.00 2000.00

Clothing 2024-02 3000.00 5000.00

Clothing 2024-03 1500.00 6500.00

Groceries 2024-01 1000.00 1000.00

Groceries 2024-02 1200.00 2200.00

Groceries 2024-03 1300.00 3500.00

2. How would you retrieve the top 5 products by sales volume, excluding any products that
had zero sales in the past 3 months?

Step 1: Create the Products Table

CREATE TABLE product_sales (

product_id INT AUTO_INCREMENT PRIMARY KEY,

product_name VARCHAR(50),

sales_volume INT,

sale_date DATE

);

Step 2: Insert Sample Records

INSERT INTO product_sales (product_name, sales_volume, sale_date) VALUES

('Product A', 150, '2024-10-01'),

('Product A', 200, '2024-11-01'),

('Product A', 180, '2024-12-01'),

('Product B', 100, '2024-10-01'),

('Product B', 0, '2024-11-01'),

('Product B', 50, '2024-12-01'),

('Product C', 250, '2024-10-15'),

('Product C', 300, '2024-11-15'),

('Product C', 400, '2024-12-15'),

('Product D', 0, '2024-10-10'),

('Product D', 0, '2024-11-10'),

('Product D', 0, '2024-12-10'),

('Product E', 500, '2024-10-05'),

('Product E', 600, '2024-11-05'),

('Product E', 700, '2024-12-05');

Step 3: Write the Query

WITH recent_sales AS (

SELECT

product_name,

SUM(sales_volume) AS total_sales,

MAX(CASE WHEN sale_date >= CURDATE() - INTERVAL 3 MONTH THEN sales_volume

ELSE 0 END) AS recent_sales_flag

FROM

product_sales

WHERE

sale_date >= CURDATE() - INTERVAL 3 MONTH

GROUP BY

product_name

valid_products AS (

SELECT

product_name,

total_sales

FROM

recent_sales

WHERE

recent_sales_flag > 0

SELECT
product_name,

total_sales

FROM

valid_products

ORDER BY

total_sales DESC

LIMIT 5;

Explanation:

1. recent_sales CTE:

o Calculates the total sales for each product.

o Uses CASE to flag whether a product had non-zero sales in the past 3
months.

2. valid_products CTE:

o Filters out products with zero sales in all the past 3 months using
recent_sales_flag > 0.

3. Final Query:

o Retrieves the top 5 products by total sales from valid_products.

o Orders the results in descending order of total sales and limits the output to
the top 5 products.

Expected Output:

Product_Name Total_Sales

Product E 1800

Product C 950

Product A 530
Product_Name Total_Sales

Product B 150

3. Given a table of customer transactions, identify all customers who made purchases in
two or more consecutive months.

To solve this, we'll assume the following table structure for customer transactions:

Step 1: Create the Transactions Table

CREATE TABLE customer_transactions (

transaction_id INT AUTO_INCREMENT PRIMARY KEY,

customer_id INT,

transaction_date DATE,

amount DECIMAL(10, 2)

);

Step 2: Insert Sample Records

INSERT INTO customer_transactions (customer_id, transaction_date, amount) VALUES

(1, '2024-01-15', 100.00),

(1, '2024-02-10', 200.00),

(1, '2024-04-05', 150.00),

(2, '2024-01-20', 300.00),

(2, '2024-02-15', 400.00),

(2, '2024-03-01', 500.00),

(3, '2024-01-25', 250.00),

(3, '2024-03-10', 300.00),

(4, '2024-02-05', 150.00),

(4, '2024-03-07', 200.00),

(4, '2024-04-15', 250.00);

Step 3: Write the Query

WITH monthly_transactions AS (

SELECT

customer_id,

DATE_FORMAT(transaction_date, '%Y-%m') AS transaction_month

FROM

customer_transactions

GROUP BY

customer_id, DATE_FORMAT(transaction_date, '%Y-%m')

consecutive_months AS (

SELECT

t1.customer_id,

t1.transaction_month AS month1,

t2.transaction_month AS month2

FROM

monthly_transactions t1

JOIN

monthly_transactions t2

t1.customer_id = t2.customer_id

AND DATE_ADD(LAST_DAY(t1.transaction_month), INTERVAL 1 DAY) =

DATE(t2.transaction_month)

)
SELECT DISTINCT

customer_id

FROM

consecutive_months;

Explanation:

1. monthly_transactions CTE:

o Groups transactions by customer and month using

DATE_FORMAT(transaction_date, '%Y-%m').

o Ensures we have a unique list of months in which a customer made

purchases.

2. consecutive_months CTE:

o Joins monthly_transactions with itself to find customers with consecutive

months.

o Uses DATE_ADD(LAST_DAY(t1.transaction_month), INTERVAL 1 DAY) to

calculate the first day of the next month and checks if it matches
t2.transaction_month.

3. Final Query:

o Selects unique customer IDs from the consecutive_months CTE.

Sample Output:

Customer_ID

Notes:
• Customer 1: Purchased in January and February.

• Customer 2: Purchased in January, February, and March.

• Customer 4: Purchased in February, March, and April.

• Customer 3: Skipped February, so they are not included in the output.

4. Write a query to calculate the retention rate of users on a monthly basis.

Retention Rate Definition

The retention rate is the percentage of users who return in a subsequent month after their
initial activity.

Assumptions

• We have a table called user_activity with the following structure:

o user_id: Unique identifier for each user.

o activity_date: Date of the user's activity.

Step 1: Create the Table

CREATE TABLE user_activity (

user_id INT,

activity_date DATE

);

Step 2: Insert Sample Records

INSERT INTO user_activity (user_id, activity_date) VALUES

(1, '2024-01-15'),

(1, '2024-02-10'),

(1, '2024-03-20'),
(2, '2024-01-20'),

(2, '2024-02-15'),

(3, '2024-02-05'),

(3, '2024-03-10'),

(4, '2024-01-25'),

(5, '2024-02-18'),

(5, '2024-03-15'),

(6, '2024-03-01');

Step 3: Query to Calculate Retention Rate

WITH first_month_activity AS (

SELECT

user_id,

DATE_FORMAT(MIN(activity_date), '%Y-%m') AS first_active_month

FROM

user_activity

GROUP BY

user_id

monthly_retention AS (

SELECT

fma.first_active_month,

DATE_FORMAT(ua.activity_date, '%Y-%m') AS active_month,

COUNT(DISTINCT ua.user_id) AS retained_users

FROM

user_activity ua
JOIN

first_month_activity fma

ua.user_id = fma.user_id

GROUP BY

fma.first_active_month, DATE_FORMAT(ua.activity_date, '%Y-%m')

monthly_cohort AS (

SELECT

first_active_month,

COUNT(DISTINCT user_id) AS cohort_size

FROM

first_month_activity

GROUP BY

first_active_month

SELECT

mr.first_active_month,

mr.active_month,

mr.retained_users,

mc.cohort_size,

ROUND((mr.retained_users / mc.cohort_size) * 100, 2) AS retention_rate

FROM

monthly_retention mr

JOIN

monthly_cohort mc
ON

mr.first_active_month = mc.first_active_month

ORDER BY

mr.first_active_month, mr.active_month;

Explanation

1. first_month_activity CTE:

o Determines the first active month for each user.

2. monthly_retention CTE:

o Counts the number of users retained for each combination of their first
active month and subsequent activity months.

3. monthly_cohort CTE:

o Calculates the size of the cohort for each first active month (the total number
of users who first became active in that month).

4. Final Query:

o Joins monthly_retention and monthly_cohort to calculate the retention rate

as: Retention Rate=(Retained UsersCohort Size)×100\text{Retention Rate} =
\left(\frac{\text{Retained Users}}{\text{Cohort Size}}\right) \times
100Retention Rate=(Cohort SizeRetained Users)×100

o Orders the results by the first active month and the active month.

Sample Output

First_Active_Month Active_Month Retained_Users Cohort_Size Retention_Rate

2024-01 2024-01 3 3 100.00

2024-01 2024-02 2 3 66.67

2024-01 2024-03 1 3 33.33

First_Active_Month Active_Month Retained_Users Cohort_Size Retention_Rate

2024-02 2024-02 3 3 100.00

2024-02 2024-03 2 3 66.67

2024-03 2024-03 1 1 100.00

Interpretation

• First Active Month: The cohort of users who became active in that month.

• Active Month: The months in which users returned.

• Retention Rate: The percentage of the cohort that returned in subsequent months.

5. Find the nth highest salary from an employee table, where n is a parameter passed
dynamically to the query.

To find the nth highest salary dynamically, we can use a subquery with the LIMIT clause.
The query involves ranking salaries in descending order, skipping the first n−1salaries, and
then retrieving the nth salary. Here's how:

Table Creation and Sample Data

CREATE TABLE employees (

employee_id INT PRIMARY KEY,

employee_name VARCHAR(50),

salary DECIMAL(10, 2)

);

INSERT INTO employees (employee_id, employee_name, salary) VALUES

(1, 'Alice', 60000.00),

(2, 'Bob', 75000.00),

(3, 'Charlie', 85000.00),

(4, 'David', 50000.00),

(5, 'Eve', 85000.00);

Query for nth Highest Salary

SET @n := 2; -- Set the value of n dynamically

SELECT DISTINCT salary

FROM employees

ORDER BY salary DESC

LIMIT @n - 1, 1;

Explanation

1. @n Variable:

o Dynamically sets the rank nnn for the desired salary.

2. DISTINCT salary:

o Ensures unique salaries are considered in case of duplicates.

3. ORDER BY salary DESC:

o Orders salaries in descending order, ranking the highest salary first.

4. LIMIT @n - 1, 1:

o Skips the top n−1n-1n−1 salaries and retrieves the next one.

Alternative Query Using Window Functions (MySQL 8.0+)

If the database supports window functions, you can use the DENSE_RANK() function:

SET @n := 2; -- Set the value of n dynamically

WITH ranked_salaries AS (

SELECT

salary,

DENSE_RANK() OVER (ORDER BY salary DESC) AS rank

FROM

employees

SELECT salary

FROM ranked_salaries

WHERE rank = @n;

Explanation (Window Functions)

1. DENSE_RANK():

o Assigns a unique rank to each salary in descending order. Duplicate salaries

get the same rank.

2. WITH ranked_salaries:

o Creates a temporary table with salaries and their respective ranks.

3. WHERE rank = @n:

o Filters the result to return only the nth rank.

Sample Output

For n=2n = 2n=2:

Salary

75000.00

Key Notes
• Use the DISTINCT keyword to handle duplicate salaries for the LIMIT method.

• Use DENSE_RANK() if you want to consider duplicate salaries as a single rank.

6. Explain how indexing works in SQL and how to decide which columns should be indexed
for optimal performance.

How Indexing Works in SQL

An index is a database structure that improves the speed of data retrieval operations on a
table. It works like an optimized lookup table for the database, allowing it to quickly locate
rows without scanning the entire table.

• Structure: Most indexes are implemented as balanced tree structures (e.g., B-trees)
or hash tables. These structures allow efficient searching, insertion, and deletion
operations.

• Function: When a query is executed, the database engine checks if an index is

available for the columns involved in the query’s filters or joins. If so, the engine
uses the index to locate the rows, reducing the need for a full table scan.

Types of Indexes

1. Primary Index:

o Automatically created for the primary key column.

o Ensures unique values and quick lookups for primary key operations.

2. Unique Index:

o Ensures that all values in the indexed column are unique.

3. Clustered Index:

o Reorders the physical storage of table data to match the index order.

o A table can have only one clustered index.

4. Non-clustered Index:

o Creates a separate structure to store the index and points to the table rows.

o A table can have multiple non-clustered indexes.

5. Composite Index:

o Indexes multiple columns together.

6. Full-Text Index:

o Optimized for searching text data, such as finding words or phrases in large
text fields.

Benefits of Indexing

• Faster Query Execution: Speeds up SELECT, JOIN, and WHERE clause operations.

• Reduced I/O Operations: Fewer rows are read from the disk.

• Sorted Data Retrieval: Helps with ORDER BY and GROUP BY clauses.

Drawbacks of Indexing

• Slower Write Operations: INSERT, UPDATE, and DELETE operations become slower
because the index must also be updated.

• Storage Overhead: Indexes consume additional disk space.

• Maintenance Overhead: Indexes need to be maintained, especially in tables with

frequent data modifications.

How to Decide Which Columns to Index

1. Frequently Queried Columns:

o Index columns that appear frequently in WHERE, JOIN, ON, ORDER BY, or
GROUP BY clauses.

2. Primary Keys and Unique Constraints:

o Always index primary key columns as they uniquely identify rows.

3. Foreign Keys:

o Index foreign key columns to improve JOIN performance.

4. High-Selectivity Columns:
o Choose columns with a wide range of unique values (e.g., a user_id column)
because indexes work best with high selectivity.

5. Composite Indexes:

o Use composite indexes when multiple columns are often queried together.
For example, for queries like:

SELECT * FROM sales WHERE year = 2023 AND region = 'North';

A composite index on (year, region) will perform better than individual indexes.

6. Avoid Low-Selectivity Columns:

o Avoid indexing columns with few distinct values (e.g., gender or status with
values like 'Active' or 'Inactive').

7. Read-Heavy Tables:

o Index columns in tables where SELECT operations are more frequent than
INSERT/UPDATE/DELETE.

Examples

Scenario 1: Searching by email in a user table

CREATE INDEX idx_email ON users(email);

• Improves performance for queries like:

SELECT * FROM users WHERE email = '[email protected]';

Scenario 2: Composite index for a sales table

CREATE INDEX idx_year_region ON sales(year, region);

• Optimizes queries with:

SELECT * FROM sales WHERE year = 2023 AND region = 'North';

Scenario 3: Indexing a foreign key

CREATE INDEX idx_customer_id ON orders(customer_id);

• Speeds up JOINs like:

SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

Monitoring and Tuning

1. EXPLAIN Plan:

o Use EXPLAIN to analyze how the database executes a query and whether it
uses an index.

2. Query Performance Metrics:

o Monitor slow queries and identify columns for potential indexing.

3. Index Maintenance:

o Periodically rebuild or reorganize indexes to ensure they remain efficient.

Summary

• Use indexes on frequently queried, high-selectivity columns.

• Avoid excessive indexing on write-heavy tables.

• Analyze query patterns and use tools like EXPLAIN to make data-driven decisions
about indexing.

7. Describe the differences between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN and
when to use each one in a complex query.

Differences Between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN

In SQL, JOIN operations combine rows from two or more tables based on a related column.
The differences among LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN lie in how
unmatched rows are handled.

1. LEFT JOIN

• Definition: Returns all rows from the left table and the matched rows from the right
table. If no match is found, the result contains NULL for columns from the right
table.

• Use Case: Use when you want all records from the left table regardless of whether
there is a match in the right table.
Syntax

SELECT columns

FROM table1

LEFT JOIN table2

ON table1.common_column = table2.common_column;

Example

• Tables:

o Customers:

CustomerID Name

1 Alice

2 Bob

3 Charlie

o Orders:

OrderID CustomerID

101 1

102 2

• Query:

SELECT c.Name, o.OrderID

FROM Customers c

LEFT JOIN Orders o

ON c.CustomerID = o.CustomerID;

• Result:

Name OrderID

Alice 101
Name OrderID

Bob 102

Charlie NULL

2. RIGHT JOIN

• Definition: Returns all rows from the right table and the matched rows from the left
table. If no match is found, the result contains NULL for columns from the left table.

• Use Case: Use when you want all records from the right table regardless of whether
there is a match in the left table.

Syntax

SELECT columns

FROM table1

RIGHT JOIN table2

ON table1.common_column = table2.common_column;

Example

• Query:

SELECT c.Name, o.OrderID

FROM Customers c

RIGHT JOIN Orders o

ON c.CustomerID = o.CustomerID;

• Result:

Name OrderID

Alice 101

Bob 102
3. FULL OUTER JOIN

• Definition: Combines the results of LEFT JOIN and RIGHT JOIN. Returns all rows
from both tables, with NULL in columns where no match exists.

• Use Case: Use when you want to include all records from both tables, showing
unmatched rows with NULL values.

Syntax

SELECT columns

FROM table1

FULL OUTER JOIN table2

ON table1.common_column = table2.common_column;

Example

• Query:

SELECT c.Name, o.OrderID

FROM Customers c

FULL OUTER JOIN Orders o

ON c.CustomerID = o.CustomerID;

• Result:

Name OrderID

Alice 101

Bob 102

Charlie NULL

When to Use Each Join in Complex Queries

1. LEFT JOIN:

o When the left table contains a primary set of data and you want to include all
rows, even if they have no matching data in the right table.
o Example: Listing all customers, including those who haven't made any
orders.

2. RIGHT JOIN:

o When the right table contains a primary set of data and you want to include
all rows, even if they have no matching data in the left table.

o Example: Listing all orders, including those made by unregistered

customers.

3. FULL OUTER JOIN:

o When both tables are equally important, and you want to analyze all data
points, even unmatched rows.

o Example: Creating a comprehensive report that includes all customers and

all orders, showing unmatched customers or orders.

Key Differences in a Nutshell

Feature LEFT JOIN RIGHT JOIN FULL OUTER JOIN

Rows from Left Table Always Included Only if Matched Always Included

Rows from Right

Only if Matched Always Included Always Included
Table

NULL in Right NULL in Left NULL in Both

Unmatched Rows
Columns Columns Columns

Visual Representation

If A represents rows from the left table and B represents rows from the right table:

• LEFT JOIN: A∪(A∩B)A \cup (A \cap B)A∪(A∩B)

• RIGHT JOIN: B∪(A∩B)B \cup (A \cap B)B∪(A∩B)

• FULL OUTER JOIN: A∪BA \cup BA∪B

Performance Tips

• Use LEFT JOIN or RIGHT JOIN instead of FULL OUTER JOIN if you only need one
side's unmatched rows, as it reduces computation.

• Always use indexes on the columns used in the ON clause to improve performance
in joins.

8. What is the difference between HAVING and WHERE clauses in SQL, and when would
you use each?

Difference Between HAVING and WHERE Clauses in SQL

1. WHERE Clause

• Purpose: Filters rows before any aggregation.

• Scope: Applied before any GROUP BY operation.

• Use Case: Used to filter rows based on conditions applied to individual columns.

Syntax:

SELECT column1, column2, ...

FROM table_name

WHERE condition;

Example:

SELECT customer_name, total_orders

FROM orders

WHERE total_orders > 50;

• Explanation: Filters out rows before aggregation (i.e., filters orders with total_orders
> 50).

2. HAVING Clause

• Purpose: Filters the aggregated results (after applying GROUP BY).

• Scope: Applied after the GROUP BY operation.

• Use Case: Used to filter aggregated data based on conditions applied to aggregate
functions like SUM, AVG, COUNT, etc.

Syntax:

SELECT column1, aggregate_function(column2)

FROM table_name

GROUP BY column1

HAVING condition;

Example:

SELECT category, COUNT(order_id) AS total_orders

FROM orders

GROUP BY category

HAVING total_orders > 10;

• Explanation: Filters the aggregated results (only categories with total_orders > 10
are included).

Key Differences

Feature WHERE Clause HAVING Clause

Purpose Filters rows before aggregation Filters aggregated results

Scope Applied to the table rows individually Applied to the grouped results

Usage Used to filter individual rows Used to filter aggregated results

Can apply conditions to non- Can apply conditions to aggregated

Conditions
aggregated columns (before grouping) columns (after grouping)

Example WHERE total_orders > 50 HAVING COUNT(order_id) > 10

When to Use Each

1. Use WHERE when:

o You need to filter rows based on conditions before performing any
aggregation.

o Example: Filtering customer records where the order count is more than 50.

2. Use HAVING when:

o You need to filter the results of an aggregation.

o Example: Counting orders by category and filtering categories with more than
10 orders.

Practical Scenario

-- Example using both WHERE and HAVING

SELECT category, COUNT(order_id) AS total_orders

FROM orders

WHERE order_date >= '2024-01-01' -- Filtering based on date before aggregation

GROUP BY category

HAVING total_orders > 10; -- Filtering aggregated results

This will show categories with more than 10 orders placed after January 1, 2024.

Intermediate Microeconomics 8th Edition Varian Solution Manual (PDFDrive)
50% (2)
Intermediate Microeconomics 8th Edition Varian Solution Manual (PDFDrive)
8 pages
Applied Electromagnetics: Early Transmission Lines Approach
No ratings yet
Applied Electromagnetics: Early Transmission Lines Approach
73 pages
Worksheets and Handouts ACTDailyJournal
No ratings yet
Worksheets and Handouts ACTDailyJournal
3 pages
Retail Analysis With Walmart Data
100% (10)
Retail Analysis With Walmart Data
2 pages
Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
Business Report Project SMDM Sonali Pradhan
100% (1)
Business Report Project SMDM Sonali Pradhan
56 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Lab: CSS Selectors and Rules: Photo Shoot Effect
No ratings yet
Lab: CSS Selectors and Rules: Photo Shoot Effect
6 pages
Walmart Data Analyst Interview Experience
No ratings yet
Walmart Data Analyst Interview Experience
10 pages
Amazon Data Analyst Interview Questions -1
No ratings yet
Amazon Data Analyst Interview Questions -1
22 pages
ade_1737191501
No ratings yet
ade_1737191501
29 pages
Python - Pandas_Numpy Interview Q&A
No ratings yet
Python - Pandas_Numpy Interview Q&A
12 pages
Debenhams Summer Sale QT
No ratings yet
Debenhams Summer Sale QT
18 pages
Flipkart Business Analyst interview questions
No ratings yet
Flipkart Business Analyst interview questions
16 pages
Python Project
No ratings yet
Python Project
20 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
Blinkit & Zepto interview questions
No ratings yet
Blinkit & Zepto interview questions
21 pages
Pandas Guide
No ratings yet
Pandas Guide
65 pages
Pandasguide Readthedocs Io en Latest PDF
No ratings yet
Pandasguide Readthedocs Io en Latest PDF
65 pages
12 CS SET A ANSKEY
No ratings yet
12 CS SET A ANSKEY
16 pages
Pandasguide
No ratings yet
Pandasguide
65 pages
Assgn
No ratings yet
Assgn
6 pages
Learneverythingai 1661068200
No ratings yet
Learneverythingai 1661068200
66 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
MATODA Raport Store20
No ratings yet
MATODA Raport Store20
13 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
SQL Practice Statements
No ratings yet
SQL Practice Statements
3 pages
ANSWER KEY FOR PB-II
No ratings yet
ANSWER KEY FOR PB-II
12 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
SQL interview questions for a Data Engineer
No ratings yet
SQL interview questions for a Data Engineer
11 pages
SQL Notes
No ratings yet
SQL Notes
25 pages
sql capstone project
No ratings yet
sql capstone project
4 pages
SQL_1729830819
No ratings yet
SQL_1729830819
10 pages
10 - Jayesh - Prakash - Rane
No ratings yet
10 - Jayesh - Prakash - Rane
26 pages
NTU AB0403 Quiz Notes
No ratings yet
NTU AB0403 Quiz Notes
18 pages
Practical 12 (1)
No ratings yet
Practical 12 (1)
6 pages
panda
No ratings yet
panda
39 pages
DEBasic Test Que NAns
No ratings yet
DEBasic Test Que NAns
15 pages
3Mark_QP_MS
No ratings yet
3Mark_QP_MS
8 pages
DS_B17_C3_CaseStudy_ShyamDalsaniya_IrannaChatti
No ratings yet
DS_B17_C3_CaseStudy_ShyamDalsaniya_IrannaChatti
20 pages
SQL 04 GROUP BY Aggregation
No ratings yet
SQL 04 GROUP BY Aggregation
23 pages
B Tech-AIML-question bank-2 Answer Key
No ratings yet
B Tech-AIML-question bank-2 Answer Key
9 pages
Training
No ratings yet
Training
17 pages
EY & Zepto Data Analyst Interview Questions
No ratings yet
EY & Zepto Data Analyst Interview Questions
24 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Window Functions
No ratings yet
Window Functions
14 pages
Programming Notes 3
No ratings yet
Programming Notes 3
3 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
Data Science
No ratings yet
Data Science
22 pages
DATA SCIENCE SAMPLE
No ratings yet
DATA SCIENCE SAMPLE
5 pages
Lab 03
No ratings yet
Lab 03
13 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Questions_For_Preparation (1)
No ratings yet
Questions_For_Preparation (1)
9 pages
SQL_05___Window_and_Date___Time_Functions__1_
No ratings yet
SQL_05___Window_and_Date___Time_Functions__1_
13 pages
Amazon Data Analysis with SQL (1)
No ratings yet
Amazon Data Analysis with SQL (1)
4 pages
Python Viva Questions With Answers
No ratings yet
Python Viva Questions With Answers
45 pages
SQL Retail Sales Project
No ratings yet
SQL Retail Sales Project
5 pages
Amazon Interview Questions & Answers
No ratings yet
Amazon Interview Questions & Answers
8 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Basic Audit Data Analytics
No ratings yet
Basic Audit Data Analytics
142 pages
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
Business Analytics: Step-by-Step Tutorial
From Everand
Business Analytics: Step-by-Step Tutorial
Narcyz Roztocki
No ratings yet
Hannspree SV32AMUB 32 Class LED HDTV Manual
No ratings yet
Hannspree SV32AMUB 32 Class LED HDTV Manual
59 pages
Canticle of the Sun - Marty Haugen (Music Sheet)
No ratings yet
Canticle of the Sun - Marty Haugen (Music Sheet)
3 pages
Manual Del ABB 53SL6000
100% (1)
Manual Del ABB 53SL6000
138 pages
Methods of Identification of Nematodes in The Laboratory
No ratings yet
Methods of Identification of Nematodes in The Laboratory
18 pages
E-book_Poetics of Diversity_ multiple perspectives in literature
No ratings yet
E-book_Poetics of Diversity_ multiple perspectives in literature
132 pages
Worksheet Interpersonal Relationship
No ratings yet
Worksheet Interpersonal Relationship
2 pages
TIAPortalOpennessenUS en US
No ratings yet
TIAPortalOpennessenUS en US
594 pages
Nature of Communication
No ratings yet
Nature of Communication
3 pages
Ged Science Xmind
No ratings yet
Ged Science Xmind
8 pages
Gemini UMX-3
No ratings yet
Gemini UMX-3
15 pages
Belt Sway Switch Working Principle - InstrumentationTools
No ratings yet
Belt Sway Switch Working Principle - InstrumentationTools
13 pages
CV MinhHieu
No ratings yet
CV MinhHieu
2 pages
Test 30
No ratings yet
Test 30
4 pages
Bitcoin Poce - Căutare Google
No ratings yet
Bitcoin Poce - Căutare Google
1 page
Speaker Cab Wiring
No ratings yet
Speaker Cab Wiring
5 pages
Apartment Maintenance Accounts Excel Template
No ratings yet
Apartment Maintenance Accounts Excel Template
27 pages
Tryongan
No ratings yet
Tryongan
11 pages
Sales Promotion Application Form Discount: Department of Trade and Industry Fair Trade Enforcement Bureau
No ratings yet
Sales Promotion Application Form Discount: Department of Trade and Industry Fair Trade Enforcement Bureau
2 pages
13K59H01 ACB150CBTH (CLICK 150i)
100% (3)
13K59H01 ACB150CBTH (CLICK 150i)
98 pages
KAIZEN
No ratings yet
KAIZEN
3 pages
Learner'S Activity Sheet in Health 9 (3 Quarter) Reading Time!
No ratings yet
Learner'S Activity Sheet in Health 9 (3 Quarter) Reading Time!
5 pages
Comprehension Skills
No ratings yet
Comprehension Skills
5 pages
World Class Operation
No ratings yet
World Class Operation
11 pages
Hazard Area Drawing - Sa LTG CVCF P 750goxx 00209 1
No ratings yet
Hazard Area Drawing - Sa LTG CVCF P 750goxx 00209 1
1 page
Full download Management and Change in Africa A Cross Cultural Perspective T. Jackson pdf docx
No ratings yet
Full download Management and Change in Africa A Cross Cultural Perspective T. Jackson pdf docx
67 pages
Temt 7100 X 01
No ratings yet
Temt 7100 X 01
7 pages