0% found this document useful (0 votes)

11 views78 pages

SQL1 Merged

Uploaded by

krishnakantuu007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views78 pages

SQL1 Merged

Uploaded by

krishnakantuu007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

CASE & WHEN

Find out which vendors primarily sell fresh products and which don’t. Add an
identifier column for the same.

SELECT Alternate: Using IF()

vendor_id,
SELECT
vendor_name,
vendor_id,
vendor_type,
vendor_name,
CASE vendor_type,
WHEN lower(vendor_type) LIKE "%fresh%" IF(UPPER(vendor_type) LIKE "%FRESH%",

THEN 1 "Fresh", "Not Fresh") AS is_fresh

FROM `farmers_market.vendor`;
ELSE 0
END AS is_fresh
FROM `farmers_market.vendor`
Put the total cost to customer purchases into bins of:
- Under $5.00
- 5 - 9.99
- 10 - 19.99
- 20 and over
SELECT
customer_id,
market_date,
quantity * cost_to_customer_per_qty AS total_amt,
CASE
WHEN quantity * cost_to_customer_per_qty < 5
THEN "Under $5"
WHEN quantity * cost_to_customer_per_qty BETWEEN 5 AND 9.99
THEN "$5 - $9.99"
WHEN quantity * cost_to_customer_per_qty BETWEEN 10 AND 19.99
THEN "$10 - $19.99"
ELSE "$20 and over"
END AS price_bins
FROM `farmers_market.customer_purchases`
Count the number of purchases each customer made per market date.

SELECT
customer_id,
market_date,
COUNT(*) AS num_orders
FROM `farmers_market.customer_purchases`
GROUP BY customer_id, market_date
ORDER BY market_date, customer_id
Calculate the total quantity purchased by each customer per market_date.

SELECT
customer_id,
market_date,
SUM(quantity) AS total_quantity
FROM `farmers_market.customer_purchases`
GROUP BY customer_id, market_date
ORDER BY market_date, customer_id
How many different kinds of products were purchased by each customer on each
market date?

SELECT
customer_id,
market_date,
COUNT(DISTINCT product_id) num_products
FROM `farmers_market.customer_purchases`
GROUP BY customer_id, market_date
ORDER BY market_date, customer_id
CASE & WHEN
Find out which vendors primarily sell fresh products and which don’t. Add an
identifier column for the same.

SELECT Alternate: Using IF()

vendor_id,
SELECT
vendor_name,
vendor_id,
vendor_type,
vendor_name,
CASE vendor_type,
WHEN lower(vendor_type) LIKE "%fresh%" IF(UPPER(vendor_type) LIKE "%FRESH%",

THEN 1 "Fresh", "Not Fresh") AS is_fresh

SELECT
customer_id,
market_date,
COUNT(DISTINCT product_id) num_products
FROM `farmers_market.customer_purchases`
GROUP BY customer_id, market_date
ORDER BY market_date, customer_id
Group By & Agg Queries
Calculate the total price paid by customer_id 3 per market_date.

SELECT
customer_id,
market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_amt
FROM `farmers_market.customer_purchases`
WHERE customer_id = 3
GROUP BY market_date, customer_id
Determine how much each customer has paid to each vendor regardless of the date.

SELECT
customer_id,
vendor_id,
SUM(quantity * cost_to_customer_per_qty) AS total_amt
FROM `farmers_market.customer_purchases`
GROUP BY customer_id, vendor_id
Give me the least and most expensive product.

Least - MIN(col)

Highest - MAX(col)

SELECT
MIN(original_price) AS least_expensive,
MAX(original_price) AS most_expensive
FROM `farmers_market.vendor_inventory`
Least and most expensive price of each vendor.
SELECT
vendor_id,
MIN(original_price) AS least_expensive,
MAX(original_price) AS most_expensive
FROM `farmers_market.vendor_inventory`
GROUP BY vendor_id
Count how many products were on sale on each market date.

SELECT
market_date,
COUNT(product_id) AS num_products
FROM `farmers_market.vendor_inventory`
GROUP BY vendor_id
Count the number of different products brought to the market between
2019-04-03 and 2019-05-16 by each vendor.

SELECT
market_date,
COUNT(DISTINCT product_id) AS num_products
FROM `farmers_market.vendor_inventory`
WHERE market_date BETWEEN "2019-04-03" and "2019-05-16"
GROUP BY vendor_id
Along with count, calculate the average of the original price of a product per
vendor.

SELECT
vendor_id,
product_id,
AVG(original_price) AS avg_price
FROM `farmers_market.vendor_inventory`
WHERE market_date BETWEEN "2019-04-03" and "2019-05-16"
GROUP BY vendor_id, product_id
Filter out vendors who brought at least 100 items to the market over the
period - 2019-05-02 and 2019-05-16

SELECT
vendor_id,
SUM(quantity) AS total_quantity
FROM `farmers_market.vendor_inventory`
WHERE market_date BETWEEN "2019-05-02" and "2019-05-16"
GROUP BY vendor_id
HAVING total_quantity >= 100
Joins-1 Queries
List all the products along with their product category names.

SELECT
p.product_id,
p.product_name,
p.product_category_id,
pc.product_category_name
FROM `farmers_market.product` AS p
LEFT JOIN `farmers_market.product_category` AS pc
ON p.product_category_id = pc.product_category_id
Get a list of customers’ zip codes who made purchases on 2019-04-06

SELECT
DISTINCT
c.customer_id,
c.customer_zip
FROM `farmers_market.customer` AS c
INNER JOIN `farmers_market.customer_purchases` AS cp
ON c.customer_id = cp.customer_id
WHERE cp.market_date = "2019-04-06"
Find the customers from the database who have never made a purchase
from the market.

SELECT
c.customer_id
FROM `farmers_market.customer` AS c
LEFT JOIN `farmers_market.customer_purchases` AS cp
ON c.customer_id = cp.customer_id
WHERE cp.customer_id IS NULL
JOINS - 3
Find out all customers who have either not made any purchase or they have
deleted their accounts.

SELECT
c.customer_id,
"New Customer" AS customer_type
FROM `farmers_market.customer` AS c
LEFT JOIN `farmers_market.customer_purchases` AS cp
ON c.customer_id = cp.customer_id
WHERE cp.customer_id IS NULL
UNION DISTINCT
SELECT
cp.customer_id,
"Deleted Customer" AS type_of_customer
FROM `farmers_market.customer` AS c
RIGHT JOIN `farmers_market.customer_purchases` AS cp
ON c.customer_id = cp.customer_id
WHERE c.customer_id IS NULL
Find out all customers who have either not made any purchase or they have
deleted their accounts.

USING FULL JOIN

SELECT
c.customer_id AS new_customers,
cp.customer_id AS deleted_customers
FROM `farmers_market.customer` AS c
FULL JOIN `farmers_market.customer_purchases` AS cp
ON c.customer_id = cp.customer_id
WHERE cp.customer_id IS NULL OR c.customer_id IS NULL
Get details about all market booths and every vendor booth assignment for every
market date along with the vendor details.

SELECT
vba.market_date,
b.booth_number,
b.booth_type,
b.booth_price_level,
vba.vendor_id,
v.vendor_name
FROM `farmers_market.booth` AS b
LEFT JOIN `farmers_market.vba` AS vba
ON b.booth_number = vba.booth_number
LEFT JOIN `farmers_market.vendor` AS v
ON vba.vendor_id = v.vendor_id
Self Join - Display the name of the manager

SELECT
emp.employeeNumber,
emp.firstName AS employee_name,
mgr.firstName As manager_name
FROM employees AS emp
LEFT JOIN employees AS mgr
ON emp.reportsTo = mgr.employeeNumber
Window Functions - 1
Get the price of the most expensive item per vendor?
SELECT
vendor_id,
MAX(original_price) AS most_expensive_price
FROM `farmers_market.vendor_inventory`
GROUP BY vendor_id
Rank the products in each vendor’s inventory. Expensive products should get a
lower rank.

SELECT
vendor_id,
market_date,
product_id,
original_price,
ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS num_rankings,
RANK() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS dense_rnk
FROM `farmers_market.vendor_inventory`
Follow up: extract all rows where the ranking is 1 in the previous question.

SELECT *
FROM (
SELECT
vendor_id,
market_date,
product_id,
original_price,
ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS num_rankings,
RANK() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS dense_rnk
FROM `farmers_market.vendor_inventory` ) AS x
WHERE x.dense_rnk = 1
Being a vendor, you want to find which of your products were above the
average price on each market date.

SELECT *
FROM
(
SELECT
market_date,
vendor_id,
original_price,
ROUND(AVG(original_price) OVER (PARTITION BY market_date),2) AS avg_price
FROM `farmers_market.vendor_inventory`
ORDER BY market_date) AS x
WHERE x.original_price > x.avg_price
Count how many products each vendor brought to the market on each
date and display the count on each row.

SELECT
vendor_id,
market_date,
product_id,
COUNT(DISTINCT product_id) OVER (PARTITION BY market_date, vendor_id) AS count_of_products
FROM `farmers_market.vendor_inventory`
ORDER BY market_date, vendor_id
Window Functions - 2
SELECT
employee,
date,
sale,
SUM(sale) OVER(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW) AS total_sales
FROM sales
SELECT
employee,
date,
sale,
AVG(sale) OVER(ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS dma
FROM `farmers_market.sales`
ORDER BY date, employee
Display each vendor’s booth assignment for each market_date alongside
their previous booth assignment.

SELECT
market_date,
vendor_id,
booth_number,
LAG(booth_number, 1) OVER (PARTITION BY vendor_id ORDER BY market_date) AS prev_booth
FROM `farmers_market.vba`
Determine which vendors are new or changing booths that day, so we can contact and
ensure smooth booth setup.
Check it for Date: 2019-04-10

SELECT *
FROM
(SELECT
market_date,
vendor_id,
booth_number,
LAG(booth_number, 1) OVER (PARTITION BY vendor_id ORDER BY market_date) AS
prev_booth
FROM `farmers_market.vba`) AS x
WHERE x.market_date = "2019-04-10" AND
x.booth_number != x.prev_booth OR x.prev_booth IS NULL
Find out the total revenue on each date, and compare it with the previous date to
check whether it is higher or lower.

SELECT
market_date,
SUM(cost_to_customer_per_qty * quantity) AS revenue,
LAG(SUM(cost_to_customer_per_qty * quantity)) OVER (ORDER BY market_date) AS prev_rev
FROM `farmers_market.customer_purchases`
GROUP BY market_date
ORDER BY market_date
Display 3rd highest salary in each department across all rows.

SELECT *,

NTH_VALUE(salary, 3) OVER(PARTITION BY department ORDER BY salary

DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING) AS third_highest

FROM employee
Date & Time Fns
From market_start_datetime, extract the following:
- Date,
- Time,EXTRACT(date from market_start_datetime) AS
-
- Day of month,
- Month of year,
- Year, SELECT
- Hour market_start_datetime,
- Minute EXTRACT(date from market_start_datetime) AS mkt_date,

- Quarter EXTRACT(time from market_start_datetime) AS mkt_time,

EXTRACT(day from market_start_datetime) AS mkt_day,
- Day of week
EXTRACT(month from market_start_datetime) AS mkt_month,
EXTRACT(year from market_start_datetime) AS mkt_year,
EXTRACT(hour from market_start_datetime) AS mkt_hour,
EXTRACT(minute from market_start_datetime) AS mkt_min,
EXTRACT(quarter from market_start_datetime) AS mkt_qtr,
EXTRACT(DAYOFWEEK from market_start_datetime) AS
mkt_weekday,
EXTRACT(WEEK from market_start_datetime) AS mkt_dayname
FROM `farmers_market.datetime_demo`
What is the time 30 minutes after the market opened?

SELECT
market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL 150 MINUTE),
DATE_SUB(market_start_datetime, INTERVAL 150 MINUTE)
FROM `farmers_market.datetime_demo`
Today is 17th July, find out how many orders were placed in the last 30 days.

SELECT
COUNT(DISTINCT order_id)
FROM orders
WHERE market_date BETWEEN DATE_SUB("2023-07-17", INTERVAL 30 DAY) AND
"2023-07-17"
What is the time period for which the data is recorded in the farmer’s
market dataset?

SELECT
MIN(market_date) AS first_prch_date,
MAX(market_date) AS last_prch_date,
DATE_DIFF(MAX(market_date), MIN(market_date), DAY)
FROM `farmers_market.customer_purchases`
Customer Profiling
- First purchase
- Most recent purchase
- How many times they have visited the market

SELECT
customer_id,
MIN(market_date) AS first_prch_date,
MAX(market_date) AS last_prch_date,
DATE_DIFF(MAX(market_date), MIN(market_date), DAY) AS customer_time_period,
COUNT(DISTINCT market_date) AS uniq_visits
FROM `farmers_market.customer_purchases`
GROUP BY customer_id
ORDER BY customer_id
Query Optimisation Tips & Tricks
Some uncommon yet important SQL query optimization tips, along with examples comparing
the regular approach with the optimized approach, each illustrated with a real-world example:

1. Avoid using SELECT *:

- Regular Approach:
```sql
SELECT * FROM employees WHERE department = 'HR';
```
- Optimized Approach:
```sql
SELECT employee_id, first_name, last_name FROM employees WHERE department = 'HR';
```
Explanation: In the regular approach, we are selecting all columns using `SELECT *`, which
may retrieve unnecessary data, leading to increased I/O and slower query execution. By
explicitly listing only the required columns, as shown in the optimized approach, we reduce the
data volume and improve query performance.

2. Minimize Subqueries:
- Regular Approach:
```sql
SELECT product_name, (SELECT AVG(price) FROM sales WHERE product_id =
products.id) AS avg_price
FROM products;
```
- Optimized Approach:
```sql
SELECT p.product_name, AVG(s.price) AS avg_price
FROM products p
JOIN sales s ON p.id = s.product_id
GROUP BY p.product_name;
```
Explanation: In the regular approach, we use a subquery to calculate the average price for
each product. This can be slow and inefficient, especially for large datasets. The optimized
approach uses a JOIN and GROUP BY to achieve the same result, resulting in better
performance.
3. Use EXISTS or JOIN instead of IN:
- Regular Approach:
```sql
SELECT order_id, order_date
FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE country = 'USA');
```
- Optimized Approach:
```sql
SELECT o.order_id, o.order_date
FROM orders o
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND
c.country = 'USA');
```
Explanation: The regular approach uses the IN operator with a subquery, which can be
inefficient when the subquery returns a large result set. The optimized approach uses EXISTS
to check for the existence of a matching record, which is often faster.

4. Use UNION ALL instead of UNION:

- Regular Approach:
```sql
SELECT employee_id, first_name FROM employees
UNION
SELECT customer_id, full_name FROM customers;
```
- Optimized Approach:
```sql
SELECT employee_id, first_name FROM employees
UNION ALL
SELECT customer_id, full_name FROM customers;
```
Explanation: The regular approach uses UNION, which removes duplicate rows from the
result set. If you know that the queries' results are distinct, use UNION ALL, which avoids the
overhead of removing duplicates and improves performance.

5. Be cautious with ORDER BY in subqueries:

- Regular Approach:
```sql
SELECT product_name
FROM products
WHERE product_id IN (SELECT product_id FROM sales ORDER BY sale_date DESC
LIMIT 10);
```
- Optimized Approach:
```sql
SELECT p.product_name
FROM products p
JOIN (
SELECT product_id
FROM sales
ORDER BY sale_date DESC
LIMIT 10
) s ON p.product_id = s.product_id;
```
Explanation: The regular approach uses ORDER BY in the subquery, which can have a
performance impact, especially when retrieving a large number of rows. The optimized
approach uses a JOIN and moves the ORDER BY to the subquery, reducing the number of
sorted rows and improving query speed.

6. Use appropriate indexing:

- Regular Approach:
```sql
SELECT product_name
FROM products
WHERE product_type = 'Electronics';
```
- Optimized Approach:
```sql
ALTER TABLE products ADD INDEX idx_product_type (product_type);
```
```sql
SELECT product_name
FROM products
WHERE product_type = 'Electronics';
```
Explanation: The regular approach doesn't utilize any index, resulting in a full table scan. The
optimized approach adds an index on the `product_type` column, which allows for faster data
retrieval when filtering based on this column.

Remember, optimization techniques may vary based on the database system and specific use
cases. Always analyze the query execution plans, profile query performance, and test different
optimization approaches to determine the most effective strategy for your environment.
JOINS
Some uncommon yet important JOINs tips in SQL queries to optimize their performance:

1. Choose the Appropriate JOIN Type:

- Regular Approach:
```sql
SELECT orders.order_id, customers.customer_name
FROM orders, customers
WHERE orders.customer_id = customers.customer_id;
```
- Optimized Approach:
```sql
SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;
```
Explanation: The regular approach uses an implicit join (comma-separated tables in the
FROM clause), which can be less readable and is prone to Cartesian products if not specified
correctly. The optimized approach uses an explicit INNER JOIN, making the query more
readable and helping the database engine optimize the join execution.

2. Use LEFT JOIN Instead of Subqueries:

- Regular Approach:
```sql
SELECT customers.customer_name, (SELECT SUM(order_total) FROM orders WHERE
customer_id = customers.customer_id) AS total_spent
FROM customers;
```
- Optimized Approach:
```sql
SELECT c.customer_name, COALESCE(SUM(o.order_total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
```
Explanation: The regular approach uses a correlated subquery to calculate the total amount
spent by each customer, which can be inefficient. The optimized approach uses a LEFT JOIN
and GROUP BY to achieve the same result, providing better performance.

3. Be Mindful of JOIN Order:

- Regular Approach:
```sql
SELECT p.product_name, c.category_name
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN suppliers s ON p.supplier_id = s.supplier_id;
```
- Optimized Approach:
```sql
SELECT p.product_name, c.category_name
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
JOIN categories c ON p.category_id = c.category_id;
```
Explanation: The order of JOINs can impact performance. In the regular approach, the JOIN
with `suppliers` is done before the JOIN with `categories`, even though the final result requires
data from both tables. In the optimized approach, we reorder the JOINs based on the query's
logical needs, potentially reducing the number of rows processed during the JOIN operations.

4. Utilize Appropriate Indexes for JOIN Columns:

- Regular Approach:
```sql
SELECT product_name, category_name
FROM products
JOIN categories ON products.category_id = categories.category_id
WHERE categories.category_name = 'Electronics';
```
- Optimized Approach:
```sql
ALTER TABLE categories ADD INDEX idx_category_name (category_name);
```
```sql
SELECT product_name, category_name
FROM products
JOIN categories ON products.category_id = categories.category_id
WHERE categories.category_name = 'Electronics';
```
Explanation: The regular approach doesn't utilize any index for the
`categories.category_name` column, which can lead to a slower query. The optimized approach
adds an index on the `category_name` column, improving the join performance when filtering
based on this column.

5. Be Mindful of Multiple JOINs:

- Regular Approach:
```sql
SELECT o.order_id, c.customer_name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;
```
- Optimized Approach:
```sql
SELECT o.order_id, c.customer_name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN (
SELECT oi.order_id, p.product_name
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
) p ON o.order_id = p.order_id;
```
Explanation: In the regular approach, multiple JOINs can lead to a complex execution plan.
The optimized approach uses a subquery to combine the `order_items` and `products` tables
before the final JOIN, potentially simplifying and improving the performance.

Remember to analyze query execution plans, profile query performance, and test different join
strategies to find the most efficient approach for your specific use case and database system.
Properly setting up JOINs is crucial for query optimization and can significantly impact the
overall performance of your SQL queries.
GROUP BY & Aggregations
Following are some tips for setting up GROUP BY clauses in SQL queries to optimize their
performance:

1. Use GROUP BY with Aggregates:

- Regular Approach:
```sql
SELECT department, employee_name
FROM employees
GROUP BY department;
```
- Optimized Approach:
```sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
```
Explanation: In the regular approach, the GROUP BY clause is used without any aggregates,
which can lead to non-deterministic results and potential errors. The optimized approach uses
the COUNT() aggregate function to count the number of employees in each department.

2. Minimize the Number of Grouped Columns:

- Regular Approach:
```sql
SELECT department, employee_name, hire_date
FROM employees
GROUP BY department, employee_name, hire_date;
```
- Optimized Approach:
```sql
SELECT department, employee_name
FROM employees
GROUP BY department, employee_name;
```
Explanation: In the regular approach, all columns are included in the GROUP BY clause,
which might lead to a large number of groups and increased query execution time. The
optimized approach reduces the number of grouped columns to only the necessary ones.
3. Use HAVING Wisely:
- Regular Approach:
```sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
```
- Optimized Approach:
```sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING employee_count > 5;
```
Explanation: In the regular approach, the HAVING clause uses an aggregate function
(COUNT(*)) in the condition, which can be less readable. The optimized approach uses the
alias `employee_count` directly in the HAVING clause, making the query easier to understand.

4. Be Mindful of Ordering in GROUP BY:

- Regular Approach:
```sql
SELECT department, employee_name, MAX(salary) AS max_salary
FROM employees
GROUP BY department, employee_name
ORDER BY max_salary DESC;
```
- Optimized Approach:
```sql
SELECT department, employee_name, MAX(salary) AS max_salary
FROM employees
GROUP BY department, employee_name
ORDER BY department, max_salary DESC;
```
Explanation: In the regular approach, the ORDER BY clause includes `max_salary` without
specifying `department`, which can lead to inconsistent sorting within departments. The
optimized approach includes `department` in the ORDER BY clause to ensure proper sorting
within each department.

5. Utilize Indexes for GROUP BY Columns:

- Regular Approach:
```sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
```
- Optimized Approach:
```sql
ALTER TABLE employees ADD INDEX idx_department (department);
```
```sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
```
Explanation: The regular approach doesn't utilize any index for the `department` column,
which can lead to a slower query, especially with a large number of rows. The optimized
approach adds an index on the `department` column, improving the GROUP BY performance.

Remember to analyze query execution plans, profile query performance, and test different
GROUP BY strategies to find the most efficient approach for your specific use case and
database system. Properly setting up GROUP BY clauses is essential for query optimization
and can significantly impact the overall performance of your SQL queries.

WINDOW FUNCTIONS
Window functions and GROUP BY serve different purposes, but they can often achieve similar
results. Here are some tips on using window functions in comparison with GROUP BY and other
use cases:

1. Aggregation with Window Functions vs. GROUP BY:

- GROUP BY Approach:
```sql
SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department;
```
- Window Function Approach:
```sql
SELECT department, salary, SUM(salary) OVER (PARTITION BY department) AS
total_salary
FROM employees;
```
Explanation: Both approaches provide the total salary for each department. The GROUP BY
approach aggregates the data and returns a single row per department. The Window Function
approach uses the SUM() window function to calculate the total salary for each department
without collapsing the result into a single row per group.

2. Ranking Rows with Window Functions:

- Regular SELECT Approach:
```sql
SELECT employee_id, first_name, salary
FROM employees
WHERE department = 'HR'
ORDER BY salary DESC
LIMIT 5;
```
- Window Function Approach:
```sql
SELECT employee_id, first_name, salary
FROM (
SELECT employee_id, first_name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM employees
WHERE department = 'HR'
) ranked_employees
WHERE rank <= 5;
```
Explanation: The Window Function approach utilizes the ROW_NUMBER() window function
to rank employees based on their salary within the 'HR' department. It then filters the result to
retrieve the top 5 highest salaries.

3. Moving Averages and Aggregations with Window Functions:

- Regular SELECT Approach (Moving Average):
```sql
SELECT date, sale_amount,
(SELECT AVG(sale_amount) FROM sales s2
WHERE s2.date BETWEEN DATE_SUB(s1.date, INTERVAL 2 DAY) AND s1.date) AS
moving_average
FROM sales s1;
```
- Window Function Approach (Moving Average):
```sql
SELECT date, sale_amount,
AVG(sale_amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND
CURRENT ROW) AS moving_average
FROM sales;
```
Explanation: The Window Function approach calculates the moving average using the AVG()
window function with a window frame defined by the current row and the two preceding rows
based on the order of the date column. This avoids the need for a correlated subquery.

4. ROWS vs. RANGE Clause in Window Functions:

- ROWS Clause Approach:
```sql
SELECT date, sale_amount,
SUM(sale_amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND
CURRENT ROW) AS total_sales
FROM sales;
```
- RANGE Clause Approach:
```sql
SELECT date, sale_amount,
SUM(sale_amount) OVER (ORDER BY date RANGE BETWEEN INTERVAL 2 DAY
PRECEDING AND CURRENT ROW) AS total_sales
FROM sales;
```
Explanation: In this case, the ROWS and RANGE approaches yield similar results. However,
when dealing with date-based data, the RANGE clause can produce different results as it
considers the distance between values, while the ROWS clause looks at the number of rows.

When using window functions, consider the specific use case, the window frame, and the
desired result set. Window functions offer powerful capabilities to perform complex calculations
and aggregations without collapsing the data, making them suitable for various analytical tasks.
However, for simple aggregations and grouping, GROUP BY remains a suitable choice. Choose
the appropriate approach based on the query's complexity, performance requirements, and the
specific analysis needed for your data.
Query Optimisation
COUNT(*) vs COUNT(1)
SELECT
COUNT(*) AS num_of_rows
FROM bigquery-public-data.san_francisco.bikeshare_trips;

SELECT
COUNT(1) AS num_of_rows
FROM bigquery-public-data.san_francisco.bikeshare_trips
Tip 1: Only select columns that you really need
SELECT *
FROM bigquery-public-data.san_francisco.bikeshare_trips

SELECT
trip_id,
start_station_name,
end_station_name
FROM bigquery-public-data.san_francisco.bikeshare_trips
Always filter your data according to requirements

SELECT *
FROM bigquery-public-data.san_francisco.bikeshare_trips
WHERE EXTRACT(year from start_date) = 2015
Tip: Read lesser amount of data
How long bike trips usually are? Calculate the average duration of one-way bike trips in any one
of the cities in SF.

SELECT
start_station_name,
end_station_name,
AVG(duration_sec) AS avg_time
FROM bigquery-public-data.san_francisco.bikeshare_trips
WHERE start_station_name != end_station_name
GROUP BY start_station_name, end_station_name
Tip 4: Use GROUP BY instead of DISTINCT
Unique list of stations

SELECT
DISTINCT
start_station_name
FROM bigquery-public-data.san_francisco.bikeshare_trips

SELECT
start_station_name
FROM bigquery-public-data.san_francisco.bikeshare_trips
GROUP BY start_station_name
Tip 5: Order your JOINs from larger table to smaller tables
Find the number of bikes and docks currently available at all stations in SF so that proper
restocking can be done.

SELECT
t2.station_id,
t2.name,
t1.bikes_available,
t1.docks_available
FROM `bigquery-public-data.san_francisco.bikeshare_status` AS t1
JOIN `bigquery-public-data.san_francisco.bikeshare_stations` AS t2
ON t2.station_id = t1.station_id
WHERE t2.landmark = "San Francisco"
Business Case Solving
Tables to be Downloaded

1. Customers
2. Suppliers
3. Employees
4. Products
5. Shippers
6. Orders
7. Order_Details
Schema
Ques. Fetch the full name and hiring date of all Employees who work as
Sales Representatives.

SELECT
CONCAT(firstname, " ", lastname) AS full_name,
hiredate
FROM `cochin_traders.employees`
WHERE title = "Sales Representative"
Ques. Which of the products in our inventory need to be reordered?
Note: For now, just use the fields UnitsInStock and ReorderLevel, where UnitsInStock is less than the ReorderLevel, ignoring the fields
UnitsOnOrder and Discontinued.

SELECT
productid,
productname
FROM `cochin_traders.products`
WHERE unitsinstock <= reorderlevel
ORDER BY productid
Ques. Find and display the details of customers who have placed more than
5 orders.

SELECT
customerid,
contactname
FROM `cochin_traders.customers`
WHERE customerid IN (SELECT
customerid
FROM `cochin_traders.orders`
GROUP BY customerid
HAVING COUNT(DISTINCT orderid) > 5)
Ques: An employee of ours (Margaret Peacock, EmployeeID 4) has the record of
completing most orders. However, there are some customers who've never placed an
order with her. Show such customers.

SELECT
c.customerid,
c.contactname
FROM `cochin_traders.customers` AS c
LEFT JOIN `cochin_traders.orders` AS o
ON c.customerid = o.customerid AND o.employeeid != 4
Ques. Retrieve the top 5 best-selling products on the basis of the quantity ordered.

SELECT
productname,
SUM(od.quantity) AS total_qty
FROM `cochin_traders.products` AS p
JOIN `cochin_traders.order_details` AS od
ON p.productid = od.productid
GROUP BY p.productname
ORDER BY total_qty DESC
LIMIT 5
Ques. Analyze the monthly order count for the year 1997.

SELECT
EXTRACT(MONTH FROM orderdate) AS month,
COUNT(DISTINCT order_id) AS num_orders
FROM `cochin_traders.orders`
WHERE EXTRACT(YEAR FROM orderdate) = 1997
GROUP BY month
ORDER BY num_orders DESC
Ques: Calculate the difference in sales revenue for each month compared to
the previous month.

WITH monthly_rev AS (
SELECT
EXTRACT(MONTH FROM orderdate) AS month,
EXTRACT(YEAR FROM orderdate) AS year,
ROUND(SUM((od.unitprice * od.quantity) - (od.discount * (od.unitprice *
od.quantity) / 100)), 2) AS revenue
FROM `cochin_traders.orders` AS o
JOIN `cochin_traders.order_details` AS od
ON o.orderid = od.orderid
GROUP BY month, year
)

SELECT
*,
LAG(m.revenue) OVER (ORDER BY year, month) AS prev_mon_rev,
revenue - LAG(m.revenue) OVER (ORDER BY year, month) AS difference_in_revenue
FROM monthly_rev AS m
ORDER BY year, month
Ques: Calculate the percentage of total sales revenue for each product.

WITH total_rev AS (
SELECT
SUM(unitprice * quantity) AS rev
FROM `cochin_traders.order_details`
), product_rev AS (
SELECT
p.productname,
SUM(od.unitprice * od.quantity) AS product_rev
FROM `cochin_traders.products` AS p
JOIN `cochin_traders.order_details` AS od
ON p.productid = od.productid
GROUP BY p.productname
)

SELECT
p.productname,
p.product_rev,
(p.product_rev / t.rev) * 100 AS percentage_revenue
FROM total_rev AS t, product_rev AS p
ORDER BY percentage_revenue DESC

Zomato SQL Analysis Project
No ratings yet
Zomato SQL Analysis Project
23 pages
Top 15 Amazon SQL Interview Questions & Answers
No ratings yet
Top 15 Amazon SQL Interview Questions & Answers
9 pages
SQL - 05 - Window Functions
No ratings yet
SQL - 05 - Window Functions
20 pages
HTTP App - Utu.ac - in Utuexmanagement Exammsters Syllabus CE4012 Database Management System - Docx - 2
No ratings yet
HTTP App - Utu.ac - in Utuexmanagement Exammsters Syllabus CE4012 Database Management System - Docx - 2
6 pages
Modern Data Warehouse White Paper PDF
100% (1)
Modern Data Warehouse White Paper PDF
26 pages
Data Analyst Resume Example
No ratings yet
Data Analyst Resume Example
2 pages
Database Security
No ratings yet
Database Security
75 pages
Blinkit & Zepto Interview Questions
No ratings yet
Blinkit & Zepto Interview Questions
21 pages
SQL Masterclass
100% (1)
SQL Masterclass
24 pages
Infra Modernization With NetApp and AWS
No ratings yet
Infra Modernization With NetApp and AWS
98 pages
Joins & Queries
No ratings yet
Joins & Queries
4 pages
Dbms (ps-7)
No ratings yet
Dbms (ps-7)
42 pages
Working With Expressions - Grouping and Summarizing Data - 16.05.20
No ratings yet
Working With Expressions - Grouping and Summarizing Data - 16.05.20
64 pages
6 SQL - 05 - Window - Fns - 5
No ratings yet
6 SQL - 05 - Window - Fns - 5
15 pages
2 Filtering - Lecture - Queries
No ratings yet
2 Filtering - Lecture - Queries
11 pages
5 GROUP - BY - Lecture - Queries - 1
No ratings yet
5 GROUP - BY - Lecture - Queries - 1
10 pages
EB05 PivotTableCalculationsLayout
No ratings yet
EB05 PivotTableCalculationsLayout
564 pages
3 Cases - Joins
No ratings yet
3 Cases - Joins
6 pages
Lecture Notes3 Filtering On Data
No ratings yet
Lecture Notes3 Filtering On Data
2 pages
Group by & Agg
No ratings yet
Group by & Agg
9 pages
GROUPBY1
No ratings yet
GROUPBY1
6 pages
Assignment7 Answers
No ratings yet
Assignment7 Answers
8 pages
SQL Queries Used in Class
No ratings yet
SQL Queries Used in Class
4 pages
Joins 3
No ratings yet
Joins 3
5 pages
SQL Q&a
No ratings yet
SQL Q&a
5 pages
SQL Masterclass
No ratings yet
SQL Masterclass
25 pages
Retailer Database
No ratings yet
Retailer Database
15 pages
Raising Queries
No ratings yet
Raising Queries
2 pages
Odd Queries
No ratings yet
Odd Queries
19 pages
Mutlti
No ratings yet
Mutlti
2 pages
Case Study 1 Solution
83% (6)
Case Study 1 Solution
4 pages
SQL-03 - Joins
No ratings yet
SQL-03 - Joins
13 pages
SQL & Python - Jupyter Notebook
No ratings yet
SQL & Python - Jupyter Notebook
7 pages
Analisis Desain Formulir Instalasi Gawat Darurat Rsud Kota Bengkulu
No ratings yet
Analisis Desain Formulir Instalasi Gawat Darurat Rsud Kota Bengkulu
5 pages
SQL 05 Window and Date Time Functions 1
No ratings yet
SQL 05 Window and Date Time Functions 1
13 pages
sm07152 DB Lab10
No ratings yet
sm07152 DB Lab10
13 pages
Baheera Assigment
No ratings yet
Baheera Assigment
8 pages
Practical-6 Aim:: Emp - 33 (Local Table) Emp - 22 (Remote Table)
No ratings yet
Practical-6 Aim:: Emp - 33 (Local Table) Emp - 22 (Remote Table)
4 pages
Queries Solved 24june
No ratings yet
Queries Solved 24june
4 pages
Quuzzzzzz
No ratings yet
Quuzzzzzz
2 pages
Zomato Interview Questions
No ratings yet
Zomato Interview Questions
23 pages
Final Output
No ratings yet
Final Output
8 pages
Dbmsassignment 8
No ratings yet
Dbmsassignment 8
11 pages
Basic Commands On Mongo Shell
No ratings yet
Basic Commands On Mongo Shell
5 pages
DB Schema
No ratings yet
DB Schema
34 pages
Lab 4 Lab SQL Northwind Sol
No ratings yet
Lab 4 Lab SQL Northwind Sol
3 pages
Problem Statement:-: PROJECT NAME - Dinny - Dinner
No ratings yet
Problem Statement:-: PROJECT NAME - Dinny - Dinner
11 pages
SQL Assignment
No ratings yet
SQL Assignment
6 pages
1 Eje
No ratings yet
1 Eje
3 pages
SQL - 02
No ratings yet
SQL - 02
21 pages
SQL Join Ex-2
No ratings yet
SQL Join Ex-2
2 pages
It2304 2011 PDF
No ratings yet
It2304 2011 PDF
12 pages
Business Intelligence: Btec FPT College
No ratings yet
Business Intelligence: Btec FPT College
39 pages
Unit Testing
No ratings yet
Unit Testing
5 pages
Speed Kun
No ratings yet
Speed Kun
30 pages
Computer Application
No ratings yet
Computer Application
3 pages
Elements of Object Oriented Data Model
No ratings yet
Elements of Object Oriented Data Model
19 pages
50 SQL Interivew Queries PDF
No ratings yet
50 SQL Interivew Queries PDF
51 pages
Consultas Resueltas
No ratings yet
Consultas Resueltas
14 pages
) Bgfvdcsxaz
No ratings yet
) Bgfvdcsxaz
4 pages
SQL Case Study
No ratings yet
SQL Case Study
29 pages
Target Data Analyst SQL Interview Questions 1737945171
No ratings yet
Target Data Analyst SQL Interview Questions 1737945171
23 pages
DDSQL s7 Exs - 2
No ratings yet
DDSQL s7 Exs - 2
2 pages
Adastra Framework For Managing Information Quality Bratislava Oct 21 2008
No ratings yet
Adastra Framework For Managing Information Quality Bratislava Oct 21 2008
36 pages
SQL Assignments Day1&2
No ratings yet
SQL Assignments Day1&2
4 pages
Data Structures and Algorithm: Avl Tree
No ratings yet
Data Structures and Algorithm: Avl Tree
42 pages
Assignment SQL Joins
No ratings yet
Assignment SQL Joins
4 pages
Dbms Exercise 2
No ratings yet
Dbms Exercise 2
5 pages
How To Purge Sale Order Records in Order Management
No ratings yet
How To Purge Sale Order Records in Order Management
6 pages
SQL Interview Q&A
No ratings yet
SQL Interview Q&A
9 pages
SQL 04 GROUP BY Aggregation
No ratings yet
SQL 04 GROUP BY Aggregation
23 pages
DAX in PowerPivot
100% (2)
DAX in PowerPivot
50 pages
50 SQL Interview Queries
No ratings yet
50 SQL Interview Queries
52 pages
Netbackup 8.0 Blueprint Catalog
No ratings yet
Netbackup 8.0 Blueprint Catalog
58 pages
Assignment No2 Storage Devices
No ratings yet
Assignment No2 Storage Devices
9 pages
SQL Interview Question
No ratings yet
SQL Interview Question
4 pages
Aggregatoin Queries Class
No ratings yet
Aggregatoin Queries Class
3 pages
NFS Services Administrator's Guide HP-UX 11i Version 3
No ratings yet
NFS Services Administrator's Guide HP-UX 11i Version 3
109 pages
Dbms Gate Notes
No ratings yet
Dbms Gate Notes
574 pages
SQL Assignment 4
100% (1)
SQL Assignment 4
3 pages
Database Management System Lab COE-317: Submitted by:-366/CO/14 SOMYA Sangal 374/CO/14 TWISHI Tyagi Coe-Iii
0% (1)
Database Management System Lab COE-317: Submitted by:-366/CO/14 SOMYA Sangal 374/CO/14 TWISHI Tyagi Coe-Iii
17 pages
Exploring The Oracle Database Architecture
No ratings yet
Exploring The Oracle Database Architecture
5 pages
Answer EX 1
No ratings yet
Answer EX 1
12 pages
SQL (Danny's Diner)
No ratings yet
SQL (Danny's Diner)
33 pages
MCQ in Bcom Ii Semester Management Informtion System: Multiple Choice Questions
No ratings yet
MCQ in Bcom Ii Semester Management Informtion System: Multiple Choice Questions
16 pages
DBM Sfile
No ratings yet
DBM Sfile
10 pages
ETL Resume
No ratings yet
ETL Resume
7 pages
MD070 Application Extensions Technical Design
No ratings yet
MD070 Application Extensions Technical Design
16 pages
SnowFlake Schema
No ratings yet
SnowFlake Schema
8 pages
How to Sell at Margins Higher Than Your Competitors: Winning Every Sale at Full Price, Rate, or Fee
From Everand
How to Sell at Margins Higher Than Your Competitors: Winning Every Sale at Full Price, Rate, or Fee
Lawrence L. Steinmetz
2/5 (1)

SQL1 Merged

Uploaded by

SQL1 Merged

Uploaded by

CASE & WHEN

SELECT Alternate: Using IF()

THEN 1 "Fresh", "Not Fresh") AS is_fresh

SELECT Alternate: Using IF()

THEN 1 "Fresh", "Not Fresh") AS is_fresh

USING FULL JOIN

NTH_VALUE(salary, 3) OVER(PARTITION BY department ORDER BY salary

- Quarter EXTRACT(time from market_start_datetime) AS mkt_time,

1. Avoid using SELECT *:

4. Use UNION ALL instead of UNION:

5. Be cautious with ORDER BY in subqueries:

6. Use appropriate indexing:

1. Choose the Appropriate JOIN Type:

2. Use LEFT JOIN Instead of Subqueries:

3. Be Mindful of JOIN Order:

4. Utilize Appropriate Indexes for JOIN Columns:

5. Be Mindful of Multiple JOINs:

1. Use GROUP BY with Aggregates:

2. Minimize the Number of Grouped Columns:

4. Be Mindful of Ordering in GROUP BY:

5. Utilize Indexes for GROUP BY Columns:

1. Aggregation with Window Functions vs. GROUP BY:

2. Ranking Rows with Window Functions:

3. Moving Averages and Aggregations with Window Functions:

4. ROWS vs. RANGE Clause in Window Functions:

You might also like