SQL Class 4 PDF Notes
SQL Class 4 PDF Notes
● Window functions: These are special SQL functions that perform a calculation across a set of related rows.
● How it works: Instead of operating on individual rows, a window function operates on a group or 'window' of rows that are
somehow related to the current row. This allows for complex calculations based on these related rows.
● Window definition: The 'window' in window functions refers to a set of rows. The window can be defined using different
criteria depending on the requirements of your operation.
● Partitions: By using the PARTITION BY clause, you can divide your data into smaller sets or 'partitions'. The window
function will then be applied individually to each partition.
● Order of rows: You can specify the order of rows in each partition using the ORDER BY clause. This order influences how
some window functions calculate their result.
● Frames: The ROWS/RANGE clause lets you further narrow down the window by defining a 'frame' or subset of rows within
each partition.
● Comparison with Aggregate Functions: Unlike aggregate functions that return a single result per group, window
functions return a single result for each row of the table based on the group of rows defined in the window.
● Advantage: Window functions allow for more complex operations that need to take into account not just the current row,
but also its 'neighbours' in some way.
Example
Window Function Syntax
● function_name: This is the window function you want to use. Examples include ROW_NUMBER(), RANK(),
DENSE_RANK(), SUM(), AVG(), and many others.
● (column): This is the column that the window function will operate on. For some functions like SUM(salary)
● OVER (): This is where you define the window. The parentheses after OVER contain the specifications for the window.
● PARTITION BY column_name_1, ..., column_name_n: This clause divides the result set into partitions upon which
the window function will operate independently. For example, if you have PARTITION BY salesperson_id, the window
function will calculate a result for each salesperson independently.
● ORDER BY column_name_1 [ASC | DESC], ..., column_name_n [ASC | DESC]: This clause specifies the order of
the rows in each partition. The window function operates on these rows in the order specified. For example, ORDER
BY sales_date DESC will make the window function operate on rows with more recent dates first.
Different Types of Window Functions
There are three main categories of window functions in SQL: Ranking functions, Value functions, and Aggregate functions. Here's a
brief description and example for each:
Ranking Functions:
● ROW_NUMBER(): Assigns a unique row number to each row, ranking start from 1 and keep increasing till the end of last row
SELECT Studentname,
Subject,
Marks,
ROW_NUMBER() OVER(ORDER BY Marks desc)
RowNumber
FROM ExamResult;
● RANK(): Assigns a rank to each row. Rows with equal values receive the same rank, with the next row receiving a rank which
skips the duplicate rankings.
SELECT Studentname,
Subject,
Marks,
RANK() OVER(ORDER BY Marks DESC) Rank
FROM ExamResult
ORDER BY Rank;
● DENSE_RANK(): Similar to RANK(), but does not skip rankings if there are duplicates.
SELECT Studentname,
Subject,
Marks,
DENSE_RANK() OVER(ORDER BY Marks DESC) Rank
FROM ExamResult
ORDER BY Rank;
Value Functions: These functions perform calculations on the values of the window rows.
SELECT
employee_name,
department,
hours,
FIRST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY hours
) least_over_time
FROM
overtime;
● LAST_VALUE(): Returns the last value in the window.
SELECT
Year,
Quarter,
Sales,
LAG(Sales, 1, 0) OVER(
PARTITION BY Year
ORDER BY Year,Quarter ASC)
AS NextQuarterSales
FROM ProductSales;
● LEAD(): Returns the value of the next row.
SELECT Year,
Quarter,
Sales,
LEAD(Sales, 1, 0) OVER(
PARTITION BY Year
ORDER BY Year,Quarter ASC)
AS NextQuarterSales
FROM ProductSales;
Aggregation Functions: These functions perform calculations on the values of the window rows.
● SUM()
● MIN()
● MAX()
● AVG()
Frame Clause in Window Functions
● The frame clause in window functions defines the subset of rows ('frame') used for calculating the result of the function for the
current row.
● It's specified within the OVER() clause after PARTITION BY and ORDER BY.
● The frame is defined by two parts: a start and an end, each relative to the current row.
● Generic syntax for a window function with a frame clause:
function_name (expression) OVER (
[PARTITION BY column_name_1, ..., column_name_n]
[ORDER BY column_name_1 [ASC | DESC], ..., column_name_n [ASC | DESC]]
[ROWS|RANGE frame_start TO frame_end]
)
SELECT
shop,
date,
revenue_amount,
MAX(revenue_amount) OVER (
ORDER BY DATE
RANGE BETWEEN INTERVAL '3' DAY PRECEDING
AND INTERVAL '1' DAY FOLLOWING
) AS max_revenue
FROM revenue_per_shop;
Output Table
Common Table Expression
A Common Table Expression (CTE) in SQL is a named temporary result set that exists only within the execution
scope of a single SQL statement. Here are some important points to note about CTEs:
WITH sales_cte AS (
SELECT sales_person, SUM(sales_amount) as total_sales
FROM sales_table
GROUP BY sales_person
)
SELECT sales_person, total_sales
FROM sales_cte
WHERE total_sales > 1000;
● Recursive CTE: This is a CTE that references itself. In other words, the CTE query definition refers back to the CTE
name, creating a loop that ends when a certain condition is met. Recursive CTEs are useful for working with
hierarchical or tree-structured data.
● IN: The IN operator allows you to specify multiple values in a WHERE clause. It returns true if a value
matches any value in a list.
● NOT IN: The NOT IN operator excludes the values in the list. It returns true if a value does not match
any value in the list.
● ANY: The ANY operator returns true if any subquery value meets the condition.
● ALL: The ALL operator returns true if all subquery value meets the condition.
● EXISTS: The EXISTS operator returns true if the subquery returns one or more records.
● NOT EXISTS: The NOT EXISTS operator returns true if the subquery returns no records.
Views
A view in SQL is a virtual table based on the result-set of an SQL statement. It contains rows and
columns, just like a real table. The fields in a view are fields from one or more real tables in the
database.
● You can add SQL functions, WHERE, and JOIN statements to a view and display the data as
if the data were coming from one single table.
● A view always shows up-to-date data. The database engine recreates the data every time a
user queries a view.
● Views can be used to encapsulate complex queries, presenting users with a simpler interface
to the data.
● They can be used to restrict access to sensitive data in the underlying tables, presenting only
non-sensitive data to users.
Syntax to create Views
1. Speeding up Query Execution: Indexes reduce the amount of data that needs to be scanned for a query, significantly speeding up
data retrieval operations.
2. Optimizing Search Operations: Indexes help in efficiently searching for records based on the indexed columns.
3. Improving Sorting and Filtering: Indexes assist in sorting and filtering operations by providing a structured way to access data.
4. Enhancing Join Performance: Indexes on join columns improve the performance of join operations between tables.
Advantages of Indexing
1. Faster Data Retrieval: Indexes make search queries faster by providing a quick way to locate rows in a table.
2. Efficient Use of Resources: Reduced query execution time translates to more efficient use of CPU and memory resources.
3. Improved Performance for Large Tables: Indexes are particularly beneficial for large tables where full table scans would be
time-consuming.
4. Better Sorting and Filtering: Indexes can improve the performance of ORDER BY, GROUP BY, and WHERE clauses.
1. Primary Key and Unique Constraints: Always index columns that are primary keys or have unique constraints, as they uniquely
identify rows.
2. Frequently Used Columns in WHERE Clauses: Index columns that are frequently used in WHERE clauses to filter data.
3. Columns Used in Joins: Index columns that are used in join conditions to speed up join operations.
4. Columns Used in ORDER BY and GROUP BY: Index columns that are used in ORDER BY and GROUP BY clauses for faster sorting
and grouping.
5. Selectivity of the Column: Choose columns with high selectivity (columns with many unique values) to maximize the performance
benefits of the index.
Query Optimizations
● Use Column Names Instead of * in a SELECT Statement
The HAVING clause is used to filter the rows after all the rows are selected and it is used like a filter. It is quite useless in a SELECT
statement. It works by going through the final result table of the query parsing out the rows that don’t meet the HAVING condition.
Example:
Original query:
SELECT s.cust_id,count(s.cust_id)
FROM SH.sales s
GROUP BY s.cust_id
HAVING s.cust_id != '1660' AND s.cust_id != '2';
Improved query:
SELECT s.cust_id,count(cust_id)
FROM SH.sales s
WHERE s.cust_id != '1660'
AND s.cust_id !='2'
GROUP BY s.cust_id;
Query Optimizations
● Eliminate Unnecessary DISTINCT Conditions
Considering the case of the following example, the DISTINCT keyword in the original query is unnecessary because the table_name
contains the primary key p.ID, which is part of the result set.
Example:
Original query:
SELECT DISTINCT * FROM SH.sales s
JOIN SH.customers c
ON s.cust_id= c.cust_id
WHERE c.cust_marital_status = 'single';
Improved query:
SELECT * FROM SH.sales s JOIN
SH.customers c
ON s.cust_id = c.cust_id
WHERE c.cust_marital_status='single';
Query Optimizations
● Consider using an IN predicate when querying an indexed column
The IN-list predicate can be exploited for indexed retrieval and also, the optimizer can sort the IN-list to match the sort sequence of the
index, leading to more efficient retrieval.
Example:
Original query:
SELECT s.*
FROM SH.sales s
WHERE s.prod_id = 14
OR s.prod_id = 17;
Improved query:
SELECT s.*
FROM SH.sales s
WHERE s.prod_id IN (14, 17);
Query Optimizations
The UNION ALL statement is faster than UNION, because UNION ALL statement does not consider duplicate s, and UNION statement
does look for duplicates in a table while selection of rows, whether or not they exist.
Example:
Original query:
SELECT cust_id
FROM SH.sales
UNION
SELECT cust_id
FROM customers;
Improved query:
SELECT cust_id
FROM SH.sales
UNION ALL
SELECT cust_id
FROM customers;