all sql interviews
all sql interviews
1) Retrieve the total number of customers who placed orders from each eatery since January 2021
across all countries. Return the eatery's name, the total number of customers, and their names in one
row.
2) For each eatery, identify the customer who placed the 3rd order. Return the eatery's name and the
customer's name.
3) Identify the Pakistani eatery with the highest cumulative order value since January 2021. Provide the
name of the customer who contributed the most to the total order value for that eatery, along with the
eatery's name and the total value of orders.
4)Determine which customers ordered from "Crunchy Delights" and "Sizzling Bites" in the year 2021 in
Pakistan. Return the names of the customers, the eatery names, the maximum order value placed by the
customer for each eatery, and the maximum order value placed by that customer across both eateries.
5)Find the eatery with the highest number of orders placed by customers from the United States since
January 2021. Return the eatery's name and the total number of orders.
6)Retrieve the names of customers who have placed orders from at least three different eateries in 2021.
Return the names of these customers along with the count of eateries they've ordered from.
In data analyst interviews, 50% of the time hiring manager after sql, start to 𝐚𝐬𝐤 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 𝐨𝐧
𝐄𝐱𝐜𝐞𝐥, and same goes with my colleagues and friends. After discussing with them, i have prepared
some list of questions for you. Here is the break down:
The interviewer show some datasets and give you scenarios to solve:
Imagine you have a dataset containing sales figures for different products across various regions. The
management wants to calculate the total sales for each product category. How would you use the SUMIF
function to accomplish this task efficiently?
You have a large dataset with customer information including names, email addresses, and phone
numbers. Your task is to extract the domain names from the email addresses provided. Explain how you
would use the RIGHT, LEFT, and FIND functions in combination to achieve this.
You're given a dataset containing sales data for a retail store over the past year. Your manager wants to
analyze the sales performance by month and product category. Describe how you would create a pivot
table to summarize and visualize this information effectively.
Your company has conducted a survey with multiple-choice questions. The survey results are stored in
an Excel sheet with each respondent's choices recorded in separate columns. How would you use pivot
tables to analyze the survey data and present the distribution of responses for each question?
You receive a CSV file with inconsistent date formats in one of the columns. Some dates are in
"MM/DD/YYYY" format, while others are in "DD/MM/YYYY" format. Explain how you would use Power
Query to standardize the date format across the entire column.
🔄 Power Query Scenario 2:
You're working with a large dataset that contains duplicate rows. Before proceeding with analysis, you
need to remove these duplicate rows to ensure data accuracy. How would you use Power Query to
identify and remove duplicate records from the dataset efficiently?
What is the difference between the VLOOKUP and INDEX-MATCH functions in Excel? When would you
use one over the other?
What are some advantages of using Power Query over traditional data manipulation techniques in Excel?
Provide examples.
How would you use the AVERAGEIF and AVERAGEIFS functions in Excel to calculate the average sales for
a specific product category within a given date range?
What are "Slicers" in Excel pivot tables, and how can they enhance data analysis and visualization?"
Explain the process of merging queries in Power Query. How does it help in combining data from
different sources for analysis?
2. Provide a use case for each of the functions Rank, Dense_Rank & Row_Number ( 💡 majority struggle )
8. Scenario based Joins question, understanding of Inner, Left and Outer Joins via simple yet tricky
question
9. LAG, write a query to find all those records where the transaction value is greater then previous
transaction value
10. Rank vs Dense Rank, query to find the 2nd highest Salary of employee
11. Write a query to find the Running Difference (Ideal sol'n using windows function)
14. Write a query to find the running difference using self join (helps in understanding the logical
approach, ideally this question is solved via windows function)
15. Write a query to find the cumulative sum using self join
(helps in understanding the logical approach, ideally this question is solved via windows function)
16. Optimize a query to find the latest transaction for each customer
17. Explore the benefits of CTEs (Common Table Expressions) in a real-world scenario
18. Write a query to calculate the percentile rank of a given value in a dataset
20. Craft a query to identify and remove duplicates based on specific criteria
FROM emp
2. Write a SQL query to find the days when the temperature was higher than its previous dates.
(Columns: Days, Temp)
FROM table_name t1
WHERE Temp > (SELECT Temp FROM table_name t2 WHERE t2.Days = t1.Days - 1);
SELECT MAX(salary) FROM Employee WHERE salary < (SELECT MAX(salary) FROM Employee);
GROUP BY department;
FROM Employee
GROUP BY department;
SELECT DISTINCT salary FROM Employee ORDER BY salary DESC LIMIT n-1, 1;
SELECT e.*
FROM Employee e
FROM Sales
GROUP BY product_id;
Recently a candidate interviewed by India's biggest Airline for Executive Data Analyst.
Questions -
2) Gave me the three tables randomly maked by interviewer and told me to apply the joins and give me
the result ?
3)Define where and having clause.? They gave me a table and conditions then we have to apply clause?
Where clause is used with Group by or not ?
4) Rate yourself in Powe BI, SQL, Microsoft Excel out of 10 , and when i answered they told me rate
yourself always high.
5)Define different types of Look up function? Difference between Lookup and index function in excel,
You have to show how these function works with an example?
6)What formulas you know in excel? Define Advance Excel and what's the use of it ?
9)Define your projects? What type of analysis you did in your Projects?
11)Define Pandas? What is Statistical Analysis? what types of Statistical Analysis you have done?
• Let's unravel this mystery with a relatable analogy and simple language!
Scenario:
• Imagine you're hosting a party with two guest lists – one for friends and another for colleagues.
• Now, let's use SQL joins to merge these lists and see who's coming together!
1. Inner Join:
• It brings together only the guests who are on both the friends' and colleagues' lists.
• It's like inviting friends who are also colleagues or vice versa.
Syntax:-
FROM Friends
Example:
If "John" is on both lists with the same email, the inner join will include him in the final guest list since
he's both a friend and a colleague.
2. Left Join:
• Now, let's say you're more inclined to prioritize your friends over colleagues.
• A left join fetches all guests from the friends' list and matches them with colleagues if they're present.
Syntax:
FROM Friends
Example:
Even if "Mary" is on the friends' list but not the colleagues' list, the left join ensures she still gets an
invite, leaving the colleague slot empty.
3. Right Join:
• A right join prioritizes colleagues, ensuring all colleagues get an invite, even if they're not on the
friends' list.
Syntax:
FROM Friends
Example:
If "David" is a colleague but not a friend, the right join ensures he's included in the guest list alongside
his colleague companions.
Syntax:
FROM Friends
Example:
Whether "Emily" is a friend, a colleague, or both, the full outer join ensures she receives an invite to
your inclusive gathering.
5. Self Join:
• Think of a self join as inviting someone to the party and asking them to bring their clone!
• It's like looking at the same guest list twice and making connections within the same table.
Syntax:
FROM Guests f
Example:
If "Alice" and "Bob" share the same email on the guest list, a self join will pair them together, creating a
unique networking opportunity within the same table.
SQL queries that are commonly asked during interviews: part 3.O
FROM Orders
2. Find the customer who has placed the highest number of orders:
FROM Orders
GROUP BY customer_id
LIMIT 1;
SELECT *
FROM Employee
FROM Employee e1
FROM Orders;
FROM Orders
SELECT *
FROM Customers
SELECT *
FROM Products
9. Retrieve the average time taken to ship orders for each shipping method:
FROM Orders
GROUP BY shipping_method;
10. Find the total number of unique customers who made purchases in each year:
SELECT EXTRACT(YEAR FROM order_date) AS year, COUNT(DISTINCT customer_id) AS num_customers
FROM Orders
3. JOINs (Inner/Left/Right)
6. String processing
8. Subquery
1. Sampling
3. Descriptive Statistics
4. p-value
5. Probability Distributions
6. t-test
7. ANOVA
8. Correlation
9. Linear Regression
Here's how Excel can help you before you dive into SQL:
In Excel, we use VLOOKUP to bring together data from different sheets. It's just like using JOINS in SQL to
get data from more than one table.
Excel's SUM and COUNT functions are like practice for SQL queries. They help you add up and count
things, which is what you often do in SQL.
Excel's IF statements let you make choices with your data. This is similar to using WHERE in SQL to pick
specific data.
Both Excel and SQL have ways to work with dates and text. Learning these in Excel first can make it easier
when you switch to SQL.
Ever used pivot tables in Excel? They're a good start for understanding the GROUP BY function in SQL,
which helps you organize and summarize data.
Excel's XLOOKUP and hyperlinks are like SQL's ways of finding and linking data. They give you a peek into
how SQL finds and connects information.
You will be asked questions on SQL in interviews for sure! Make sure to practice 2-3 questions daily, it
can't be mastered overnight!
SQL queries that are commonly asked during interviews
FROM Orders
GROUP BY product_id;
42. Find the total number of orders and the average order amount for each customer:
FROM Orders
GROUP BY customer_id;
43. List the products that have been sold more than 100 times:
SELECT product_id
FROM Sales
GROUP BY product_id
44. Retrieve the email addresses of customers who have not made any purchases:
SELECT email
FROM Customers
45. Calculate the total number of days each product has been in stock:
SELECT product_id, DATEDIFF(DAY, MIN(stock_date), MAX(stock_date)) AS total_days_in_stock
FROM Stock
GROUP BY product_id;
46.Find the departments with the highest and lowest average employee salaries:
Highest average
FROM Employee
GROUP BY department
LIMIT 1;
lowest average
FROM Employee
GROUP BY department
LIMIT 1;
47. List the customers who have made purchases in all months of the year:
SELECT customer_id
FROM Orders
GROUP BY customer_id
SELECT product_id,
FROM Sales
GROUP BY product_id;
49. Retrieve the employees who have joined in the last quarter:
SELECT *
FROM Employee
50. List the products that have never been out of stock:
SELECT product_id
FROM Products
WHERE product_id NOT IN (SELECT DISTINCT product_id FROM Stock WHERE stock_quantity = 0);
FROM Orders
GROUP BY customer_id;
52. Find the customers who have placed orders on consecutive days:
FROM Orders o1
53. Calculate the total revenue generated from each product category:
FROM Orders o
GROUP BY p.category_id;
54. Retrieve the top 3 most profitable products based on total revenue:
FROM Orders
GROUP BY product_id
55. Find the number of employees in each salary range (e.g., 0-50000, 50001-100000, etc.):
SELECT CONCAT(FLOOR(salary/50000)50000 + 1, '-', FLOOR(salary/50000)*50000 + 50000) AS
salary_range, COUNT() AS num_employees
FROM Employee
GROUP BY FLOOR(salary/50000);
56. Retrieve the top 5 most frequent words from a text column:
FROM (
FROM table_name
) AS words
GROUP BY word
LIMIT 5;
57. Calculate the percentage change in sales amount compared to the previous month for each product:
SELECT product_id,
FROM Sales
GROUP BY product_id;
58. List the customers who have placed orders for all products:
SELECT customer_id
FROM Orders
GROUP BY customer_id
59. Retrieve the orders placed by customers who have not logged in to the system in the last 30 days:
SELECT *
FROM Orders
WHERE customer_id IN (
SELECT customer_id
FROM Customers
FROM (
FROM OrderDetails
GROUP BY order_id
) AS order_products;
Optimize Joins: Minimize the number of joins and use appropriate join types (e.g., INNER JOIN, LEFT
JOIN) to ensure efficient data retrieval.
Avoid SELECT * : Instead of selecting all columns using SELECT *, explicitly specify only the columns
needed for the query to reduce unnecessary data transfer and processing overhead.
Use WHERE Clause Wisely: Filter rows early in the query using WHERE clause to reduce the dataset size
before joining or aggregating data.
Avoid Subqueries: Whenever possible, rewrite subqueries as JOINs or use Common Table Expressions
(CTEs) for better performance.
Limit the Use of DISTINCT: Minimize the use of DISTINCT as it requires sorting and duplicate removal,
which can be resource-intensive for large datasets.
Optimize GROUP BY and ORDER BY: Use GROUP BY and ORDER BY clauses judiciously, and ensure that
they are using indexed columns whenever possible to avoid unnecessary sorting.
Consider Partitioning: Partition large tables to distribute data across multiple nodes, which can improve
query performance by reducing I/O operations.
Monitor Query Performance: Regularly monitor query performance using tools like query execution
plans, database profiler, and performance monitoring tools to identify and address bottlenecks.
2. String Functions
4. Mathematical Functions
5. Conditional Functions
6. Conversion Functions
CAST/CONVERT: Converts data from one data type to another.
numbers.
After few years of experience you will realize that just writing the SQL queries are not sufficient but
optimizing the queries for better performance is important.
➡ Using numerical data type and not character data type for storing numerical data.
SELECT customer_id
FROM Orders
GROUP BY customer_id
62. Find the average time taken to ship orders for each product category:
FROM Orders o
GROUP BY p.category_id;
63. Retrieve the customers who have placed orders for more than 10 unique products in a single order:
FROM (
FROM OrderDetails
) AS order_products
64. Calculate the total sales for each product category in the last quarter:
SELECT p.category_id, SUM(o.amount) AS total_sales_last_quarter
FROM Orders o
GROUP BY p.category_id;
FROM (
FROM Employee_Departments
) AS multi_department_employees
66. Retrieve the orders with the highest and lowest order amounts:
UNION
67. Find the top 3 most common pairs of products bought together:
SELECT product_id1, product_id2, COUNT(*) AS pair_count
FROM (
) AS product_pairs
LIMIT 3;
68. Calculate the percentage of total orders for each product category:
FROM Orders o
GROUP BY p.category_id;
69. Retrieve the customers who have made purchases on weekends only:
SELECT customer_id
FROM Orders
GROUP BY customer_id
3. Components in Power BI
6. What is RDBMS ?
7. What are there window functions in SQL. Why do we use it ( explain all )
( Based on the ranking of the length of the movies in each rating category)
10 10
10 10
11 14
12 NULL
NULL 12
12 NULL
12. What is VLOOKUP, why do we use it , Syntax of VLOOKUP ( Asked 1 question on VLOOKUP.)
SELECT customer_id
) AS customer_products_per_category
GROUP BY customer_id
72.Find the top 3 most common words in a text column excluding common stop words ("and", "the",
"is", etc.):
FROM (
FROM table_name
) AS words
WHERE word NOT IN ('and', 'the', 'is', 'of', 'a', 'to', 'in', 'it')
GROUP BY word
LIMIT 3;
73. Calculate the average number of orders per month for each customer:
SELECT customer_id, AVG(num_orders) AS avg_orders_per_month
FROM (
FROM Orders
) AS monthly_orders
GROUP BY customer_id;
74. Retrieve the products that have been out of stock for the longest continuous period:
FROM (
FROM Stock
) AS stock_groups
WHERE stock_quantity = 0
LIMIT 1;
FROM (
FROM Employee_Departments
GROUP BY employee_id
) AS employee_department_counts
GROUP BY employee_id
76.Retrieve the products that have experienced a decrease in sales amount for each consecutive month
for the last three months:
SELECT product_id
FROM (
SELECT product_id,
SUM(amount) AS total_amount
FROM Sales
) AS sales_per_month
WHERE rn <= 3
GROUP BY product_id
77. Find the average length of time between orders for each customer:
SELECT customer_id, AVG(DATEDIFF(day, LAG(order_date) OVER(PARTITION BY customer_id ORDER BY
order_date), order_date)) AS avg_time_between_orders
FROM Orders
GROUP BY customer_id;
71. Retrieve the customers who have made purchases of at least three different products in each
category:
SELECT customer_id
) AS customer_products_per_category
GROUP BY customer_id
72.Find the top 3 most common words in a text column excluding common stop words ("and", "the",
"is", etc.):
FROM (
FROM table_name
) AS words
WHERE word NOT IN ('and', 'the', 'is', 'of', 'a', 'to', 'in', 'it')
GROUP BY word
ORDER BY frequency DESC
LIMIT 3;
73. Calculate the average number of orders per month for each customer:
FROM (
FROM Orders
) AS monthly_orders
GROUP BY customer_id;
74. Retrieve the products that have been out of stock for the longest continuous period:
FROM (
FROM Stock
) AS stock_groups
WHERE stock_quantity = 0
LIMIT 1;
75. List the employees who have worked in all departments:
SELECT employee_id
FROM (
FROM Employee_Departments
GROUP BY employee_id
) AS employee_department_counts
GROUP BY employee_id
76.Retrieve the products that have experienced a decrease in sales amount for each consecutive month
for the last three months:
SELECT product_id
FROM (
SELECT product_id,
SUM(amount) AS total_amount
FROM Sales
) AS sales_per_month
WHERE rn <= 3
GROUP BY product_id
77. Find the average length of time between orders for each customer:
SELECT customer_id, AVG(DATEDIFF(day, LAG(order_date) OVER(PARTITION BY customer_id ORDER BY
order_date), order_date)) AS avg_time_between_orders
FROM Orders
GROUP BY customer_id;
81. Retrieve the customers who have made purchases of all products within a specific category:
SELECT customer_id
FROM (
FROM Orders o
) AS customer_product_count
82. Find the top 3 most popular product categories based on the total number of orders:
83. Retrieve the orders with the highest and lowest total order amounts within each product category:
FROM (
SELECT o.order_id, o.product_id, p.category_id, SUM(o.amount) AS order_amount
FROM Orders o
) AS order_amounts
84. List the customers who have made purchases in all product categories:
86. Find the employees who have not been assigned to any department:
87. Retrieve the orders with the highest and lowest total order quantities within each product category:
) AS order_quantities
88. List the customers who have made purchases on both weekdays and weekends:
HAVING COUNT(DISTINCT CASE WHEN EXTRACT(ISODOW FROM order_date) <= 5 THEN 'weekday' ELSE
'weekend' END) = 2;
91. Retrieve the top 5 customers with the highest average order amounts:
FROM Orders
GROUP BY customer_id
LIMIT 5;
92. Find the top 3 most frequent combinations of products bought together (pairs):
FROM (
) AS product_pairs
GROUP BY product1, product2
LIMIT 3;
93. List the customers who have made purchases in every month of a given year:
SELECT customer_id
FROM Orders
GROUP BY customer_id
94. Retrieve the top 5 most profitable months based on total sales amount:
FROM Orders
LIMIT 5;
95.Find the customers who have placed orders on the same day they registered as customers:
FROM Customers c
96. Retrieve the orders with the highest and lowest total order amounts for each month:
WITH monthly_orders AS (
FROM Orders
SELECT *
FROM (
FROM monthly_orders
) AS highest_orders
WHERE rank_highest = 1
UNION
SELECT *
FROM (
FROM monthly_orders
) AS lowest_orders
WHERE rank_lowest = 1;
97. List the products that have been sold at least once every month for the past year:
SELECT product_id
FROM (
FROM Sales
) AS monthly_sales
GROUP BY product_id
98. Retrieve the customers who have made purchases of all products with a specific attribute (e.g., color,
size):
SELECT customer_id
FROM (
FROM Orders o
) AS customer_product_count
GROUP BY customer_id
HAVING COUNT() = (SELECT COUNT() FROM Products WHERE attribute = 'color'); -- Specify
the attribute here
SQL:
1. What is the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN in SQL? When would you use
each type of join?
2. Explain the concept of normalization in database design. Why is it important, and what are the
different normal forms?
3. Discuss the role of indexes in SQL databases. How do indexes improve query performance, and what
are some best practices for index usage?
❇ Basic Level
4. Explain the difference between CHAR and VARCHAR data types in SQL.
✅Intermediate Level
2. Describe the differences between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
✅ Advanced Level
4. What is Normalization ?
commands?
Excel:
1. How would you use Excel to clean and preprocess a dataset before analysis? Can you give examples of
common data cleaning techniques and Excel functions you might use?
2. Explain the difference between absolute and relative cell references in Excel formulas. When would
you use each type, and provide examples?
3. Walk me through the process of creating a pivot table in Excel. What are the key steps, and how would
you customize the pivot table to analyze specific aspects of the data?
4. Can you demonstrate how to use Excel's "VLOOKUP" function to perform a lookup operation between
two datasets? What are some alternatives to VLOOKUP, and when would you use them?
5. How can you use Excel's "What-If Analysis" tools such as Scenario Manager or Goal Seek to perform
predictive modeling or sensitivity analysis?
SQL:
1. Write a SQL query to retrieve all orders from a specific customer from a table named "Orders" and join
it with the "Customers" table to include customer details.
2. Explain the difference between the "WHERE" and "HAVING" clauses in SQL. When would you use each
clause, and provide examples?
3. How do you handle NULL values in SQL queries? Can you demonstrate how to filter, replace, or handle
NULL values in a SQL query?
4. Describe the process of creating an index in SQL. When would you create an index, and what are some
factors to consider when designing indexes for a database?
5. What is a subquery in SQL, and how would you use it to perform complex queries or data
transformations? Provide an example of a scenario where a subquery would be useful.
What it means: The table you're referencing isn't present in the database.
Connection Issues:
What it means: There's difficulty establishing communication with the database server.
Fix: Verify your internet connection. If it's stable, seek assistance from your technical support team.
Too Many Rows:
What it means: Your query may return a larger dataset than anticipated.
Fix: Refine your query by adding filters or applying limits to narrow down the results.
What it means: Your join conditions are not properly specified, leading to unexpected results.
Fix: Review your join conditions to ensure they accurately connect the related tables.
What it means: Your query is missing a WHERE clause, resulting in unintended data retrieval.
Fix: Include a WHERE clause to filter the results based on specific conditions.
What it means: The order in which operations are executed in your query is not as intended.
Fix: Use parentheses to explicitly specify the order of operations in complex queries.
What it means: SQL is case-sensitive, and mismatches in case can lead to errors.
Fix: Ensure consistency in casing for table names, column names, and keywords throughout your queries.
What it means: Changes made within a transaction are not committed properly, leading to data
inconsistency.
Fix: Explicitly commit transactions after making changes to ensure data integrity.
Indexing Problems:
What it means: Missing or improper indexing can result in slow query performance.
Fix: Analyze query execution plans and consider adding appropriate indexes to optimize performance.
What it means: Misuse of aggregate functions (e.g., SUM, AVG) can lead to incorrect results.
Fix: Ensure that aggregate functions are applied to the correct columns and properly grouped data.
Syntax Error:
Fix: Review your code carefully. Look for missed commas, brackets, or spelling errors.
What it means: You're attempting to insert a blank value where it's not allowed.
Fix: Ensure all required fields are filled with appropriate information.
Duplicate Entry:
What it means: You're inserting data that already exists in the database.
Permission Issues:
What it means: You lack the necessary permissions for the operation you're attempting.
Fix: Confirm you have the appropriate permissions, or seek assistance from your IT team.
Wrong Data Type:
What it means: You're trying to input data of an incorrect type into a field.
Fix: Check the expected data type for the field and ensure compatibility with your input.
Types of Keys?
Hello DataFam!
🔶 Here are the most asked SQL Interview Questions, which are very basic though but one often forgets
to pay attention to basics, but basics are the key to land your dream job.
## Question 1.
## Question 2.
## Question 3.
Difference between Unique, primary keys, foreign keys.
## Question 4.
## Question 5.
## Question 6.
Imagine there is a FULL_NAME column in a table which has values like “Elon Musk“, “Bill Gates“, “Jeff
Bezos“ etc. So each full name has a first name, a space and a last name. Which functions would you use
to fetch only the first name from this FULL_NAME column? Give example.
## Question 7.
How can you convert a text into date format? Consider the given text as “31–01–2021“.
## Question 8.
What is the difference between LEFT, RIGHT, FULL outer join and INNER join?
## Question 10.
Can we use aggregate function as window function? If yes then how do we do it?
Basic Level 🚀
4. Explain the difference between CHAR and VARCHAR data types in SQL.
Intermediate Level 🚀
2. Describe the differences between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
7. What is an Index?
Advanced Level 🚀
1. Which is faster between CTE and Subquery?
4. What is Normalization?
What it means: You've unintentionally created a Cartesian join, resulting in a huge, unintended result set.
Fix: Review your join conditions to ensure they properly connect related tables, preventing Cartesian
products.
Using Reserved Keywords Incorrectly:
What it means: Attempting to use SQL reserved keywords as identifiers (e.g., table or column names).
Fix: Enclose reserved keywords in backticks or double quotes to use them as identifiers, or avoid using
them altogether.
Inefficient Subqueries:
What it means: Subqueries that are poorly optimized and lead to slow query performance.
Fix: Rewrite subqueries to optimize their execution or consider alternative approaches like JOINs or
derived tables.
What it means: Forgetting to define foreign key constraints can lead to data integrity issues.
Fix: Ensure that appropriate foreign key constraints are defined to maintain referential integrity between
related tables.
What it means: Storing redundant or duplicate data, leading to data inconsistency and inefficiency.
Fix: Normalize your database schema by breaking down tables into smaller, related tables to reduce
redundancy and improve data integrity.
What it means: Making changes to the database without wrapping them in transactions, risking data
inconsistencies.
Fix: Utilize transactions to ensure that a series of database operations either all succeed or all fail,
maintaining data consistency.
Fix: Follow best practices such as parameterized queries, input validation, and role-based access control
to enhance database security.
This SQL question is from Amazon's interview for the Data Scientist role. Give it a try!
𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧:
Given a table of transactions and products, write a function to get the month_over_month change in
revenue for the year 2019. Make sure to round month_over_month to 2 decimal places.
Took me around 10- 15 minutes to frame my thoughts, type, and submit. Not to forget the errors, haha.
𝐌𝐲 𝐭𝐡𝐨𝐮𝐠𝐡𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬:
2. Calculated month-over-month sales growth using the 𝐥𝐚𝐠() function to retrieve the previous month's
sales.
Atomicity: Ensures that all operations in a transaction are completed successfully, or none of them are.
Consistency: Guarantees that the database remains in a consistent state before and after the
transaction.
Isolation: Ensures that concurrent transactions do not interfere with each other and are executed
independently.
Durability: Ensures that once a transaction is committed, its changes are permanently saved and cannot
be lost.
Normalization: It is the process of organizing data in a database efficiently by reducing redundancy and
dependency. It involves dividing large tables into smaller ones and defining relationships between them
to minimize redundancy and improve data integrity.
Use the IS NULL or IS NOT NULL operators to check for NULL values in columns.
Use the COALESCE() function to replace NULL values with a specified default value.
Use the IFNULL() function in MySQL or the ISNULL() function in SQL Server to replace NULL values with a
specified default value.
Handle NULL values appropriately in WHERE clauses, JOIN conditions, and aggregate functions to ensure
accurate query results.
UNION: Combines the result sets of two or more SELECT statements into a single result set and removes
duplicate rows.
UNION ALL: Also combines the result sets of two or more SELECT statements into a single result set but
retains all rows, including duplicates.
SQL best practices:
✔ Use table aliases with columns when you are joining multiple tables
✔ Add useful comments wherever you write complex logic and avoid too many comments.
✔ Avoid wildcards at beginning of predicates (something like '%abc' will cause full table scan to get the
results)
✔ Considering cardinality within GROUP BY can make it faster (try to consider unique column first in
group by list)
✔ Create CTEs instead of multiple sub queries , it will make your query easy to read.
✔ Join tables using JOIN keywords instead of writing join condition in where clause for better readability.
✔ If you know there are no duplicates in 2 tables, use UNION ALL instead of UNION for better
performance
✔ Always start WHERE clause with 1 = 1.This has the advantage of easily commenting out conditions
during debugging a query.
✔ Taking care of NULL values before using equality or comparisons operators. Applying window
functions. Filtering the query before joining and having clause.
✔ Make sure the JOIN conditions among two table Join are either keys or Indexed attribute.
Constraints in SQL
In SQL Server, constraints are rules defined on a table column that enforce the integrity and accuracy of
the data stored in the table. Here are some common constraints used in SQL Server:
Ensures that a column (or a combination of columns) contains unique values, and no NULL values are
allowed. A table can have only one primary key.
Syntax
Column2 VARCHAR(50)
);
Enforces referential integrity between two tables. The foreign key in one table refers to the primary key
in another table.
Syntax
OrderDate DATE
);
3. UNIQUE Constraint:
Ensures that all values in a column (or a combination of columns) are unique, but it allows NULL values.
Syntax
Column2 VARCHAR(50)
);
4. CHECK Constraint:
Syntax
Column2 VARCHAR(50)
);
5. DEFAULT Constraint:
Provides a default value for a column if no value is specified during the INSERT operation.
Syntax
Column2 VARCHAR(50) )
Syntax
Column2 VARCHAR(50)
);
These constraints help maintain the integrity and reliability of the data stored in SQL Server tables by
specifying rules and conditions that must be satisfied. They play a crucial role in database design and
data consistency
Complete #SQL (Topic wise) and Python Interview Questions from MNC for entry level position :-
SQL keywords
Data types
Operators
CREATE TABLE
ALTER TABLE
DROP TABLE
Truncate table
INSERT statement
UPDATE statement
DELETE statement
4. Aggregate Functions:
GROUP BY clause
HAVING clause
5. Data Constraints:
Primary Key
Foreign Key
Unique
NOT NULL
CHECK
6. Joins:
INNER JOIN
LEFT JOIN
RIGHT JOIN
Self Join
Cross Join
7. Subqueries:
Nested subqueries
Correlated subqueries
9. Views:
Creating views
Modifying views
Dropping views
10. Indexes:
Creating indexes
In SQL Server, a subquery is a query nested inside another query, and it can be used to retrieve data that
will be used by the main query. Subqueries can appear in various parts of a SQL statement, such as the
SELECT, FROM, WHERE, and HAVING clauses. Here are examples of subqueries in different contexts:
Syntax:
FROM table1;
Syntax:
FROM (
) AS subquery;
Syntax:
FROM table1
Use a subquery to generate a result set that is then joined with another table.
Syntax:
FROM table1
INNER JOIN (
5. Correlated Subqueries:
Syntax:
SELECT column1
WHERE column2 > (SELECT AVG(column2) FROM table1 WHERE category = outer.category);
6. Subqueries with EXISTS:
Syntax:
SELECT column1
FROM table1
Syntax:
SELECT column1
FROM table1
𝑺𝑸𝑳:
1. Basic
2. Intermediate
3. Advanced
• Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK, DENSE_RANK, lead, lag)
String Functions:
It is one of the built in function.Built in function means accepts input performs calculation and returns
value .
1) UPPER(string/colname)
2) LOWER(string/colname)
LEN(string)
LEFT(string,len)
5) RIGHT(string,len)
-used to extract characters from right side
6) REPLICATE() :-
REPLICATE(str,no of times)
7) REPLACE() :-
REPLACE(str1,str2,str3)
8) TRANSLATE() :-
TRANSLATE(str1,str2,str3)
=> used to extract part of the string starting from specific position.
SUBSTRING(string,start,length)
10) CHARINDEX() :-
CHARINDEX(char,string,[start])
"A Candidate failed in Amazon Data Analytics interview - these were the questions he couldn't
answer"
Here are 7 of the questions I was asked (and failed to answer some as a fresher):
1. What SQL statements would you use to get all unique values from a column?
3. Can you explain the difference between a left join and an inner join?
Answer: A left join returns all the values from the left table and matching values from the right table,
while an inner join only returns matching values from both tables.
4. How can you find the second-highest salary from an employee table in SQL?
Answer: SELECT MAX(salary) FROM employee WHERE salary < (SELECT MAX(salary) FROM employee);
5. Explain the difference between the COUNT() and SUM() functions in SQL?
Answer: COUNT() returns the number of rows in a table, while SUM() returns the sum of values in a
specific column.
Answer: SELECT column_name(s) FROM table_name GROUP BY column_name(s) HAVING COUNT(*) > 1;
7. How would you order a table by multiple columns in descending order in SQL?
Write an SQL query to find the name of the product with the highest price in each country.
Write an SQL query to calculate the total transaction amount for each customer for the current year. The
output should contain Customer_Name and the total amount.
1- Top N products by sales , Top N products within each category, Ton N employees by salaries etc.
2- Year over year growth, YOY growth for each category , Products with higher sales than previous month
etc.
3- Running sales over months , rolling N months sales , within each category etc.
4- Pivot rows to columns , eg : year wise sales for each category in separate columns etc
5- Number of records after different kinds of joins.
These 5 themes are very common and interviewers can ask variations of it and some
follow up questions.
Day-1
1. *What is SQL?*
- SQL stands for Structured Query Language. It is a standard language used for managing and
manipulating relational databases.
- DELETE is a DML command used to remove specific rows from a table based on a condition, while
TRUNCATE is a DDL command used to remove all rows from a table without logging individual row
deletions.
4. *What is a primary key?*
- A primary key is a unique identifier for each row in a table. It ensures that each row in a table is
uniquely identifiable and cannot have a NULL value.
- A foreign key is a column or a combination of columns that establishes a link between data in two
tables. It enforces referential integrity between the two related tables.
- WHERE clause is used to filter records before they are grouped and sorted, while HAVING clause is used
to filter records after they have been grouped.
- Normalization is the process of organizing data in a database to minimize redundancy and dependency.
It involves dividing large tables into smaller tables and defining relationships between them.
8. *What is an index?*
- An index is a database object used to improve the speed of data retrieval operations on a table. It
allows faster retrieval of data by creating an ordered list of key values.
- A join is used to combine rows from two or more tables based on a related column between them.
There are different types of joins, including INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
10. *What is the difference between INNER JOIN and OUTER JOIN?*
- INNER JOIN returns only the rows that have matching values in both tables, while OUTER JOIN returns
all rows from both tables and fills in NULL values for columns that do not have a match.
Basic Concepts for Data Analyst (SQL)
Index in SQL
In SQL, an index is a database object that improves the speed of data retrieval operations on a database
table. Indexes can be created on one or more columns of a table, and they work similar to the index in a
book—making it faster to find specific information. Here are some key points about indexes in SQL:
1. Types of Indexes:
Clustered Index: Determines the physical order of data in a table. There can only be one clustered index
per table.
Non-clustered Index: Creates a separate structure for the index, keeping a pointer to the actual data
rows.
2. Creating an Index:
The CREATE INDEX statement is used to create an index. Here's a simple example:
Syntax:
3. Using Indexes:
Indexes are automatically used by the database engine to speed up data retrieval operations, especially
in SELECT statements and WHERE clauses.
However, having too many indexes on a table can impact the performance of data modification
operations (such as INSERT, UPDATE, DELETE), as the indexes need to be maintained.
Non-clustered indexes provide a separate structure for the index, with pointers to the actual data rows.
5. Composite Index:
An index that involves more than one column. It can improve query performance for specific
combinations of columns.
Syntax:
6. Unique Index:
Ensures that the values in the indexed columns are unique. It is often used to enforce the uniqueness of
a column or combination of columns.
Syntax:
7. Dropping an Index:
Syntax:
Regularly analyze and optimize indexes based on the changing workload and data distribution.
Indexes play a crucial role in database performance optimization, helping to speed up query execution.
However, they should be used judiciously, and their maintenance should be considered to ensure
optimal database performance.
- A foreign key is a field or a combination of fields in one table that refers to the primary key in another
table. It establishes a relationship between two tables.
- DELETE is a DML (Data Manipulation Language) command used to remove rows from a table based on
specified criteria, while TRUNCATE is a DDL (Data Definition Language) command used to remove all
rows from a table, but it does not log individual row deletions.
- INNER JOIN returns only the rows where there is a match in both tables, while LEFT JOIN returns all the
rows from the left table and the matched rows from the right table. If there is no match, NULL values are
returned for the columns from the right table.
. *What is a subquery?*
- A subquery, also known as a nested query or inner query, is a query nested within another query. It can
be used to return data that will be used in the main query's condition.
- A view is a virtual table based on the result of a SELECT query. It can be used to simplify complex
queries, provide an additional level of security by restricting access to certain columns, or hide the
complexity of the underlying data structure.
- ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity ensures that either all the
operations within a transaction are completed successfully, or none of them are applied. Consistency
ensures that the database remains in a consistent state before and after the transaction. Isolation
ensures that the execution of multiple transactions concurrently does not result in data inconsistency.
Durability ensures that once a transaction is committed, its changes are permanent and survive
system failures.
A primary key is a unique identifier for each record in a database table. It is important because it ensures
that each record can be uniquely identified and helps maintain data integrity by preventing duplicate or
null values.
2. Can you explain the difference between INNER JOIN and OUTER JOIN in SQL?
INNER JOIN returns only the rows that have matching values in both tables, while OUTER JOIN returns all
rows from one table and the matched rows from the other table (or null values if there is no match).
Normalization is the process of organizing data in a database to reduce redundancy and dependency. It is
important because it helps improve data integrity, reduce storage space, and make data maintenance
easier.
You can handle missing data in SQL queries by using functions like COALESCE or IFNULL to replace null
values with a default value, or by using the IS NULL or IS NOT NULL operators to filter out records
with missing data.
In the world of SQL interviews, understanding the nuances between the WHERE and HAVING clauses can
be a game-changer!
🔹 WHERE Clause: Think of it as your initial filter when querying a database. This clause is used to extract
records that fulfill a specified condition or set of conditions. It operates on individual rows before the
data is grouped or aggregated.
🔹 HAVING Clause: Here's where things get interesting! The HAVING clause is used in combination with
GROUP BY to filter the results returned by GROUP BY based on specified conditions. It operates on
aggregated data, allowing you to filter groups of rows based on aggregate values.
🔍 Key Difference:
Let's say we have a table of employees with their respective salaries. If we want to find departments
where the average salary is above a certain threshold:
- We'd use HAVING to filter groups of departments based on their average salary.
Understanding these nuances showcases your depth of understanding SQL query optimization and data
manipulation. It demonstrates your ability to efficiently extract the precise data you need, even in
complex scenarios.
Have you encountered any tricky SQL interview questions related to WHERE and HAVING clauses? Share
your experiences and insights below! Let's learn together.
Imagine you're a chef in a restaurant, and you want to categorize dishes based on their spice level using
SQL:
You have a list of dishes and you want to classify them as Mild, Medium, or Spicy based on their spice
level.
You tell your kitchen staff: "If the dish is not spicy at all, call it Mild. If it's a bit spicy, call it Medium. If it's
really spicy, call it Spicy."
Example:
SELECT
DishName,
SpiceLevel,
CASE
ELSE 'Spicy'
END AS SpiceCategory
FROM Menu;
Here, you're using the CASE tool to create a new column (SpiceCategory) based on the spice level of
each dish.
You can use this spice level tool in different parts of your kitchen. Maybe you want to sort the dishes on
the menu based on their spice level, or you want to update the labels of existing dishes.
Example:
UPDATE Menu
SET SpiceCategory =
CASE
ELSE 'Spicy'
END;
Think of the CASE statement as your spice guide. If it matches a certain spice level, it tells you how to
sort dishes or update their labels.
Example:
SELECT
DishName,
SpiceLevel
FROM Menu
ORDER BY
CASE
ELSE 3
END;
Here, you're telling SQL to sort dishes. If it's Low spice, it comes first; if it's Medium, it comes next, and
everything else comes last.
In the kitchen of SQL, the CASE statement is like your spice level guide, helping you classify, sort, or
update dishes based on certain conditions. It's a handy tool for making decisions in your data world!
2. Update 0 in the preferred salary part of your profile: Set your preferred salary to 0 to attract more
recruiters. in the interview you can say your expectaion
3. Update your profile daily from 9 AM to 10 AM: Make minor changes and re-upload your resume daily
to stay active.
4. Write a relevant profile headline: Use a headline like "data analyst with 3 years of experience in SQL,
python, tableau, advanced MS excel, and more."
5. Choose "15 days or less" in the notice period section: Indicate your availability as "15 days or less" for
a quicker response from employers.
6. Add relevant skillsets in the key skills section: List skills that match the job roles you are applying for.
7. Showcase your best projects: Highlight projects that are relevant to the job roles you are seeking.
8. Consider a basic premium subscription: A premium subscription can increase your profile's visibility
(optional).
9. Highlight achievements with metrics: Emphasize your successes with quantifiable metrics, like for
example - "increased GMV by 35 CR or 15%."
10. Use keywords effectively: Incorporate industry-specific keywords throughout your profile.
11. Add a professional photo: A professional photo can create a positive impression.
12. Customize your resume for each application: Tailor your resume to highlight the most relevant
experience and skills for each job.
13. Regularly update your profile with new skills and projects: Keep your profile current by adding new
accomplishments and skills as they develop.