SQL Documents
SQL Documents
A CTE that references itself is called as recursive CTE. Recursive CTE's can be of great
help when displaying hierarchical data. Example, displaying employees in an organization
hierarchy. A simple organization chart is shown below.
Let's create tblEmployee table, which holds the data, that's in the organization chart.
Create Table tblEmployee
(
EmployeeId int Primary key,
Name nvarchar(20),
ManagerId int
)
Since, a MANAGER is also an EMPLOYEE, both manager and employee details are
stored in tblEmployee table. Data from tblEmployee is shown below.
2
Let's say, we want to display, EmployeeName along with their ManagerName. The
ouptut should be as shown below.
To achieve this, we can simply join tblEmployee with itself. Joining a table with itself is
called as self join. We discussed about Self Joins in Part 14 of this video series. In the
output, notice that since JOSH does not have a Manager, we are displaying 'Super Boss',
instead of NULL. We used IsNull(), function to replace NULL with 'Super Boss'. If you want
to learn more about replacing NULL values, please watch Part 15.
SELF JOIN QUERY:
Select Employee.Name as [Employee Name],
IsNull(Manager.Name, 'Super Boss') as [Manager Name]
from tblEmployee Employee
left join tblEmployee Manager
on Employee.ManagerId = Manager.EmployeeId
Along with Employee and their Manager name, we also want to display their level in the
3
union all
The EmployeesCTE contains 2 queries with UNION ALL operator. The first query selects
the EmployeeId, Name, ManagerId, and 1 as the level from tblEmployee where ManagerId
is NULL. So, here we are giving a LEVEL = 1 for super boss (Whose Manager Id is NULL).
In the second query, we are joining tblEmployee with EmployeesCTE itself, which allows
us to loop thru the hierarchy. Finally to get the reuired output, we are
joining EmployeesCTE with itself.
4
Another common problem, is that data can become inconsistent. For example, let's
say, JOHN has resigned, and we have a new department head (STEVE) for IT department.
At present, there are 3 IT department rows in the table, and we need to update all of them.
Let's assume I updated only one row and forgot to update the other 2 rows, then obviously,
the data becomes inconsistent.
Another problem, DML queries (Insert, update and delete), could become slow, as there
could many records and columns to process.
5
So, to reduce the data redundancy, we can divide this large badly organised table into
two (Employees and Departments), as shown below. Now, we have reduced redundant
department data. So, if we have to update department head name, we only have one row to
update, even if there are 10 million employees in that department.
Database normalization is a step by step process. There are 6 normal forms, First
Normal form (1NF) thru Sixth Normal Form (6NF). Most databases are in third normal form
(3NF). There are certain rules, that each normal form should follow.
Now, let's explore the first normal form (1NF). A table is said to be in 1NF, if
1. The data in each column should be atomic. No multiple values, sepearated by comma.
2. The table does not contain any repeating column groups
3. Identify each record uniquely using primary key.
In the table below, data in Employee column is not atomic. It contains multiple
employees seperated by comma. From the data you can see that in the IT department, we
have 3 employees - Sam, Mike, Shan. Now, let's say I want to change just, SHAN name. It
is not possible, we have to update the entire cell. Similary it is not possible to select or
delete just one employee, as the data in the cell is not atomic.
The 2nd rule of the first normal form is that, the table should not contain any
repeating column groups. Consider the Employee table below. We have repeated the
Employee column, from Employee1 to Employee3. The problem with this design is that, if a
department is going to have more than 3 employees, then we have to change the table
structure to add Employee4 column. Employee2 and Employee3 columns in the HR
6
department are NULL, as there is only employee in this department. The disk space is
simply wasted.
To eliminate the repeating column groups, we are dividing the table into 2. The
repeating Employee columns are moved into a seperate table, with a foreign key pointing to
the primary key of the other table. We also, introduced primary key to uniquely identify each
record.
In this video will learn about second normal form (2NF) and third normal form (3NF)
A table is said to be in 2NF, if
1. The table meets all the conditions of 1NF
2. Move redundant data to a separate table
3. Create relationship between these tables using foreign keys.
The table below violates second normal form. There is lot of redundant data in the table.
Let's say, in my organization there are 100,000 employees and only 2 departments (IT &
HR). Since we are storing DeptName, DeptHead and DeptLocation columns also in the
same table, all these columns should also be repeated 100,000 times, which results in
unnecessary duplication of data.
7
So this table is clearly violating the rules of the second normal form, and the
redundant data can cause the following issues.
1. Disk space wastage
2. Data inconsistency
3. DML queries (Insert, Update, Delete) can become slow
Now, to put this table in the second normal form, we need to break the table into 2, and
move the redundant department data (DeptName, DeptHead and DeptLocation) into it's
own table. To link the tables with each other, we use the DeptId foreign key. The tables
below are in 2NF.
The table below, violates third normal form, because AnnualSalary column is not fully
dependent on the primary key EmpId. The AnnualSalary is also dependent on the Salary
column. In fact, to compute the AnnualSalary, we multiply the Salary by 12. Since
AnnualSalary is not fully dependent on the primary key, and it can be computed, we can
remove this column from the table, which then, will adhere to 3NF.
8
Let's look at another example of Third Normal Form violation. In the table below,
DeptHead column is not fully dependent on EmpId column. DeptHead is also dependent
on DeptName. So, this table is not in 3NF.
To put this table in 3NF, we break this down into 2, and then move all the columns that
are not fully dependent on the primary key to a separate table as shown below. This design
is now in 3NF.
Pivot is a sql server operator that can be used to turn unique values from one column,
into multiple columns in the output, there by effectively rotating a table.
Select * from tblProductSales: As you can see, we have 3 sales agents selling in 3
countries
10
Now, let's write a query which returns TOTAL SALES, grouped by SALESCOUNTRY
and SALESAGENT. The output should be as shown below.
11
At, this point, let's try to present the same data in different format using PIVOT
operator.
This PIVOT query is converting the unique column values (India, US, UK) in
SALESCOUNTRY column, into Columns in the output, along with performing aggregations
on the SALESAMOUNT column. The Outer query, simply, selects SALESAGENT column
from tblProductSales table, along with pivoted columns from the PivotTable.
Having understood the basics of PIVOT, let's look at another example. Let's create
tblProductsSale, a slight variation of tblProductSales, that we have already created. The
table, that we are creating now, has got an additional Id column.
Create Table tblProductsSale
(
Id int primary key,
SalesAgent nvarchar(50),
SalesCountry nvarchar(50),
SalesAmount int
)
Now, run the same PIVOT query that we have already created, just by changing the name
of the table to tblProductsSale instead of tblProductSales
Select SalesAgent, India, US, UK
from tblProductsSale
Pivot
(
Sum(SalesAmount) for SalesCountry in ([India],[US],[UK])
)
as PivotTable
Syntax:
BEGIN TRY
{ Any set of SQL statements }
END TRY
BEGIN CATCH
[ Optional: Any set of SQL statements ]
END CATCH
[Optional: Any other SQL Statements]
Any set of SQL statements, that can possibly throw an exception are wrapped between
BEGIN TRY and END TRY blocks. If there is an exception in the TRY block, the control
immediately, jumps to the CATCH block. If there is no exception, CATCH block will be
skipped, and the statements, after the CATCH block are executed.
Errors trapped by a CATCH block are not returned to the calling application. If any
15
part of the error information must be returned to the application, the code in the CATCH
block must do so by using RAISERROR() function.
2. Also notice that, in the scope of the CATCH block, there are several system functions,
that are used to retrieve more information about the error that occurred These functions
return NULL if they are executed outside the scope of the CATCH block.
Let's understand transaction processing with an example. For this purpose, let's Create
and populate, tblMailingAddress and tblPhysicalAddress tables
Create Table tblMailingAddress
(
AddressId int NOT NULL primary key,
EmployeeNumber int,
17
HouseNumber nvarchar(50),
StreetAddress nvarchar(50),
City nvarchar(10),
PostalCode nvarchar(50)
)
Insert into tblMailingAddress values (1, 101, '#10', 'King Street', 'Londoon', 'CR27DW')
Insert into tblPhysicalAddress values (1, 101, '#10', 'King Street', 'Londoon', 'CR27DW')
An employee with EmployeeNumber 101, has the same address as his physical and
mailing address. His city name is mis-spelled as Londoon instead of London. The
following stored procedure 'spUpdateAddress', updates the physical and mailing
addresses. Both the UPDATE statements are wrapped between BEGIN TRANSACTION
and COMMIT TRANSACTION block, which in turn is wrapped between BEGIN TRY and
END TRY block.
So, if both the UPDATE statements succeed, without any errors, then the transaction is
committed. If there are errors, then the control is immediately transferred to the catch block.
The ROLLBACK TRANSACTION statement, in the CATCH block, rolls back the
transaction, and any data that was written to the database by the commands is backed out.
Begin Catch
Rollback Transaction
End Catch
End
Let's now make the second UPDATE statement, fail. CITY column length in
tblPhysicalAddress table is 10. The second UPDATE statement fails, because the value for
CITY column is more than 10 characters.
Alter Procedure spUpdateAddress
as
Begin
Begin Try
Begin Transaction
Update tblMailingAddress set City = 'LONDON12'
where AddressId = 1 and EmployeeNumber = 101
Now, if we execute spUpdateAddress, the first UPDATE statements succeeds, but the
second UPDATE statement fails. As, soon as the second UPDATE statement fails, the
control is immediately transferred to the CATCH block. The CATCH block rolls the
transaction back. So, the change made by the first UPDATE statement is undone.
Atomic - All statements in the transaction either completed successfully or they were all
rolled back. The task that the set of operations represents is either accomplished or not, but
in any case not left half-done. For example, in the spUpdateInventory_and_Sell stored
procedure, both the UPDATE statements, should succeed. If one UPDATE statement
succeeds and the other UPDATE statement fails, the database should undo the change
19
made by the first UPDATE statement, by rolling it back. In short, the transaction should be
ATOMIC.
Consistent - All data touched by the transaction is left in a logically consistent state. For
example, if stock available numbers are decremented from tblProductTable, then, there
has to be a related entry in tblProductSales table. The inventory can't just disappear.
Isolated - The transaction must affect data without interfering with other concurrent
transactions, or being interfered with by them. This prevents transactions from making
changes to data based on uncommitted information, for example changes to a record that
are subsequently rolled back. Most databases use locking to maintain transaction
isolation.
Durable - Once a change is made, it is permanent. If a system error or power failure occurs
20
before a set of commands is complete, those commands are undone and the data is
restored to its original state once the system begins running again.
Insert into tblProducts values ('TV', '52 inch black color LCD TV')
Insert into tblProducts values ('Laptop', 'Very thin black color acer laptop')
Insert into tblProducts values ('Desktop', 'HP high performance desktop')
Most of the times subqueries can be very easily replaced with joins. The above query
is rewritten using joins and produces the same results. Select tblProducts.[Id], [Name],
[Description]
from tblProducts
21
In this example, we have seen how to use a subquery in the where clause.
Let us now discuss about using a sub query in the SELECT clause. Write a query to
retrieve the NAME and TOTALQUANTITY sold, using a subquery.Select [Name],
(Select SUM(QuantitySold) from tblProductSales where ProductId = tblProducts.Id) as
TotalQuantity
from tblProducts
order by Name
From these examples, it should be very clear that, a subquery is simply a select
statement, that returns a single value and can be nested inside a SELECT, UPDATE,
INSERT, or DELETE statement.
Subqueries are always encolsed in paranthesis and are also called as inner queries, and
the query containing the subquery is called as outer query.
The columns from a table that is present only inside a subquery, cannot be used in the
SELECT list of the outer query
In the example below, sub query is executed first and only once. The sub query results
are then used by the outer query. A non-corelated subquery can be executed independently
of the outer query.
22
If the subquery depends on the outer query for its values, then that sub query is called
as a correlated subquery. In the where clause of the subquery below, "ProductId" column
get it's value from tblProducts table that is present in the outer query. So, here the
subquery is dependent on the outer query for it's value, hence this subquery is a correlated
subquery. Correlated subqueries get executed, once for every row that is selected by the
outer query. Corelated subquery, cannot be executed independently of the outer query.
Select [Name],
(Select SUM(QuantitySold) from tblProductSales where ProductId = tblProducts.Id) as
TotalQuantity
from tblProducts
order by Name
Creating a large table with random data for performance testing - Part
61\
In this video we will discuss about inserting large amount of random data into sql
server tables for performance testing.
If (Exists (select *
from information_schema.tables
where table_name = 'tblProducts'))
Begin
Drop Table tblProducts
End
-- Recreate tables
Create Table tblProducts
(
[Id] int identity primary key,
23
[Name] nvarchar(50),
[Description] nvarchar(250)
)
Print @Id
Set @Id = @Id + 1
End
set @LowerLimitForProductId = 1
set @UpperLimitForProductId = 100000
set @LowerLimitForUnitPrice = 1
24
set @LowerLimitForQuantitySold = 1
set @UpperLimitForQuantitySold = 10
Print @Counter
Set @Counter = @Counter + 1
End
Finally, check the data in the tables using a simple SELECT query to make sure the
data has been inserted as expected.
Select * from tblProducts
Select * from tblProductSales
The following query, returns, the list of products that we have sold atleast once. This
query is formed using sub-queries. When I execute this query I get 306,199 rows in 6
seconds
Select Id, Name, Description
from tblProducts
where ID IN
(
Select ProductId from tblProductSales
)
At this stage please clean the query and execution plan cache using the following T-
SQL command.
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS; -- Clears query cache
Go
DBCC FREEPROCCACHE; -- Clears execution plan cache
GO
Now, run the query that is formed using joins. Notice that I get the exact same 306,199
rows in 6 seconds.
Please Note: I have used automated sql script to insert huge amounts of this random data.
Please watch Part 61 of SQL Server tutorial, in which we have discussed about this
automated script.
According to MSDN, in some cases where existence must be checked, a join produces
better performance. Otherwise, the nested query must be processed for each result of the
outer query. In such cases, a join approach would yield better results.
The following query returns the products that we have not sold at least once. This query is
formed using sub-queries. When I execute this query I get 93,801 rows in 3 seconds
When I execute the below equivalent query, that uses joins, I get the exact same 93,801
rows in 3 seconds.
26
In general joins work faster than sub-queries, but in reality it all depends on the execution
plan that is generated by SQL Server. It does not matter how we have written the query,
SQL Server will always transform it on an execution plan. If sql server generates the same
plan from both queries, we will get the same result.
I would say, rather than going by theory, turn on client statistics and execution plan to see
the performance of each option, and then make a decision.
However, if there is ever a need to process the rows, on a row-by-row basis, then
cursors are your choice. Cursors are very bad for performance, and should be avoided
always. Most of the time, cursors can be very easily replaced using joins.
There are different types of cursors in sql server as listed below. We will talk about the
differences between these cursor types in a later video session.
1. Forward-Only
2. Static
3. Keyset
4. Dynamic
Let us now look at a simple example of using sql server cursor to process one row at
time. We will be using tblProducts and tblProductSales tables, for this example. The tables
here show only 5 rows from each table. However, on my machine, there are 400,000
records in tblProducts and 600,000 records in tblProductSales tables. If you want to learn
about generating huge amounts of random test data, please watch Part - 61 in sql server
video tutorial.
27
Cursor Example: Let us say, I want to update the UNITPRICE column in tblProductSales
table, based on the following criteria
1. If the ProductName = 'Product - 55', Set Unit Price to 55
2. If the ProductName = 'Product - 65', Set Unit Price to 65
3. If the ProductName is like 'Product - 100%', Set Unit Price to 1000
-- Fetch the row from the result set into the variable
Fetch Next from ProductIdCursor into @ProductId
The cursor will loop thru each row in tblProductSales table. As there are 600,000 rows, to
be processed on a row-by-row basis, it takes around 40 to 45 seconds on my machine. We
can achieve this very easily using a join, and this will significantly increase the performance.
We will discuss about this in our next video session.
To check if the rows have been correctly updated, please use the following query.
Select Name, UnitPrice
from tblProducts join
tblProductSales on tblProducts.Id = tblProductSales.ProductId
where (Name='Product - 55' or Name='Product - 65' or Name like 'Product - 100%')
Update tblProductSales
set UnitPrice =
Case
When Name = 'Product - 55' Then 155
When Name = 'Product - 65' Then 165
29
When I executed this query, on my machine it took less than a second. Where as the
same thing using a cursor took 45 seconds. Just imagine the amount of impact cursors
have on performance. Cursors should be used as your last option. Most of the time cursors
can be very easily replaced using joins.
To check the result of the UPDATE statement, use the following query.
Select Name, UnitPrice from
tblProducts join
tblProductSales on tblProducts.Id = tblProductSales.ProductId
where (Name='Product - 55' or Name='Product - 65' or
Name like 'Product - 100%')
In this video we will discuss, writing a transact sql query to list all the tables in a sql
server database. This is a very common sql server interview question.
Object explorer with in sql server management studio can be used to get the list of tables
in a specific database. However, if we have to write a query to achieve the same, there are
3 system views that we can use.
1. SYSOBJECTS - Supported in SQL Server version 2000, 2005 & 2008
2. SYS.TABLES - Supported in SQL Server version 2005 & 2008
3. INFORMATION_SCHEMA.TABLES - Supported in SQL Server version 2005 & 2008
Executing the above query on my SAMPLE database returned the following values for
XTYPE column from SYSOBJECTS
IT - Internal table
P - Stored procedure
PK - PRIMARY KEY constraint
S - System table
SQ - Service queue
U - User table
V - View
Please check the following MSDN link for all possible XTYPE column values and what they
represent.
http://msdn.microsoft.com/en-us/library/ms177596.aspx
Let's understand writing re-runnable sql scripts with an example. To create a table
tblEmployee in Sample database, we will write the following CREATE TABLE sql script.
USE [Sample]
Create table tblEmployee
(
ID int identity primary key,
Name nvarchar(100),
Gender nvarchar(10),
DateOfBirth DateTime
)
When you run this script once, the table tblEmployee gets created without any errors. If
you run the script again, you will get an error - There is already an object named
'tblEmployee' in the database.
Use [Sample]
31
The above script is re-runnable, and can be run any number of times. If the table is not
already created, the script will create the table, else you will get a message stating - The
table already exists. You will never get a sql script error.
Sql server built-in function OBJECT_ID(), can also be used to check for the existence of the
table
IF OBJECT_ID('tblEmployee') IS NULL
Begin
-- Create Table Script
Print 'Table tblEmployee created'
End
Else
Begin
Print 'Table tblEmployee already exists'
End
Depending on what we are trying to achieve, sometime we may need to drop (if the table
already exists) and re-create it. The sql script below, does exactly the same thing.
Use [Sample]
IF OBJECT_ID('tblEmployee') IS NOT NULL
Begin
Drop Table tblEmployee
End
Create table tblEmployee
(
ID int identity primary key,
Name nvarchar(100),
Gender nvarchar(10),
DateOfBirth DateTime
32
Let's look at another example. The following sql script adds column "EmailAddress" to
table tblEmployee. This script is not re-runnable because, if the column exists we get a
script error.
Use [Sample]
ALTER TABLE tblEmployee
ADD EmailAddress nvarchar(50)
Col_length() function can also be used to check for the existence of a column
If col_length('tblEmployee','EmailAddress') is not null
Begin
Print 'Column already exists'
End
Else
Begin
Print 'Column does not exist'
End
We will be using table tblEmployee for this demo. Use the sql script below, to create and
populate this table with some sample data.
Create table tblEmployee
(
ID int primary key identity,
33
Name nvarchar(50),
Gender nvarchar(50),
Salary nvarchar(50)
)
The requirement is to group the salaries by gender. The output should be as shown below.
To achieve this we would write a sql query using GROUP BY as shown below.
Select Gender, Sum(Salary) as Total
from tblEmployee
Group by Gender
When you execute this query, we will get an error - Operand data type nvarchar is invalid
for sum operator. This is because, when we created tblEmployee table, the "Salary"
column was created using nvarchar datatype. SQL server Sum() aggregate function can
only be applied on numeric columns. So, let's try to modify "Salary" column to use int
datatype. Let's do it using the designer.
1. Right click on "tblEmployee" table in "Object Explorer" window, and select "Design"
2. Change the datatype from nvarchar(50) to int
3. Save the table
At this point, you will get an error message - Saving changes is not permitted. The changes
you have made require the following tables to be dropped and re-created. You have either
made changes to a table that can't be re-created or enabled the option Prevent saving
changes that require the table to be re-created.
34
So, the obvious next question is, how to alter the database table definition without the
need to drop, re-create and again populate the table with data?
There are 2 options
Option 2: Disable "Prevent saving changes that require table re-creation" option in sql
server 2008
1. Open Microsoft SQL Server Management Studio 2008
2. Click Tools, select Options
3. Expand Designers, and select "Table and Database Designers"
4. On the right hand side window, uncheck, Prevent saving changes that require table re-
creation
35
5. Click OK
('Paul Sensit','[email protected]',29,'Male','2007-10-23')
Name, Email, Age and Gender parameters of spSearchEmployees stored procedure are
optional. Notice that, we have set defaults for all the parameters, and in the "WHERE"
clause we are checking if the respective parameter IS NULL.
Create Proc spSearchEmployees
@Name nvarchar(50) = NULL,
@Email nvarchar(50) = NULL,
@Age int = NULL,
@Gender nvarchar(50) = NULL
as
Begin
Select * from tblEmployee where
(Name = @Name OR @Name IS NULL) AND
(Email = @Email OR @Email IS NULL) AND
(Age = @Age OR @Age IS NULL) AND
(Gender = @Gender OR @Gender IS NULL)
End
This stored procedure can be used by a search page that looks as shown below.
The merge statement joins the target table to the source table by using a common column
in both the tables. Based on how the rows match up as a result of the join, we can then
perform insert, update, and delete on the target table.
Example 1 : In the example below, INSERT, UPDATE and DELETE are all performed in
one statement
1. When matching rows are found, StudentTarget table is UPDATED (i.e WHEN
MATCHED)
2. When the rows are present in StudentSource table but not in StudentTarget table those
rows are INSERTED into StudentTarget table (i.e WHEN NOT MATCHED BY TARGET)
3. When the rows are present in StudentTarget table but not in StudentSource table those
rows are DELETED from StudentTarget table (i.e WHEN NOT MATCHED BY SOURCE)
38
MERGE StudentTarget AS T
USING StudentSource AS S
ON T.ID = S.ID
WHEN MATCHED THEN
UPDATE SET T.NAME = S.NAME
WHEN NOT MATCHED BY TARGET THEN
INSERT (ID, NAME) VALUES(S.ID, S.NAME)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
Please Note : Merge statement should end with a semicolon, otherwise you would get an
error stating - A MERGE statement must be terminated by a semi-colon (;)
In real time we mostly perform INSERTS and UPDATES. The rows that are present in
target table but not in source table are usually not deleted from the target table.
Example 2 : In the example below, only INSERT and UPDATE is performed. We are not
deleting the rows that are present in the target table but not in the source table.
40
What is a transaction
A transaction is a group of commands that change the data stored in a database. A
transaction, is treated as a single unit of work. A transaction ensures that, either all of the
commands succeed, or none of them. If one of the commands in the transaction fails, all of
the commands fail, and any data that was modified in the database is rolled back. In this
way, transactions maintain the integrity of data in a database.
Example : The following transaction ensures that both the UPDATE statements succeed or
both of them fail if there is a problem with one UPDATE statement.
Databases are powerful systems and are potentially used by many users or applications at
the same time. Allowing concurrent transactions is essential for performance but may
introduce concurrency issues when two or more transactions are working with the same
data at the same time.
• Dirty Reads
• Lost Updates
42
• Nonrepeatable Reads
• Phantom Reads
We will discuss what these problems are in detail with examples in our upcomning videos
One way to solve all these concurrency problems is by allowing only one user to execute,
only one transaction at any point in time. Imagine what could happen if you have a large
database with several users who want to execute several transactions. All the transactions
get queued and they may have to wait a long time before they could get a chance to
execute their transactions. So you are getting poor performance and the whole purpose of
having a powerful database system is defeated if you serialize access this way.
At this point you might be thinking, for best performance let us allow all transactions to
execute concurrently. The problem with this approach is that it may cause all sorts of
concurrency problems (i.e Dirty Reads, Lost Updates, Nonrepeatable Reads, Phantom
Reads) if two or more transactions work with the same data at the same time.
• Read Uncommitted
• Read Committed
• Repeatable Read
• Snapshot
• Serializable
The isolation level that you choose for your transaction, defines the degree to which
one transaction must be isolated from resource or data modifications made by other
transactions. Depending on the isolation level you have chosen you get varying degrees of
performance and concurrency problems. The table here has the list of isoltaion levels along
with concurrency side effects.
Isolation Level Dirty Reads Lost Update Nonrepeatable Reads Phantom Reads
Read Uncommitted Yes Yes Yes Yes
Read Committed No Yes Yes Yes
Repeatable Read No No No Yes
Snapshot No No No No
Serializable No No No No
If you choose the lowest isolation level (i.e Read Uncommitted), it increases the number of
concurrent transactions that can be executed at the same time, but the down side is you
have all sorts of concurrency issues. On the other hand if you choose the highest isolation
level (i.e Serializable), you will have no concurrency side effects, but the downside is that,
this will reduce the number of concurrent transactions that can be executed at the same
time if those transactions work with same data.
In our upcoming videos we will discuss the concurrency problems in detail with examples
43
A dirty read happens when one transaction is permitted to read data that has been modified
by another transaction that has not yet been committed. In most cases this would not cause
a problem. However, if the first transaction is rolled back after the second reads the data,
the second transaction has dirty data that does not exist anymore.
Table tblInventory
Dirty Read Example : In the example below, Transaction 1, updates the value of
ItemsInStock to 9. Then it starts to bill the customer. While Transaction 1 is still in progress,
Transaction 2 starts and reads ItemsInStock value which is 9 at the moment. At this point,
Transaction 1 fails because of insufficient funds and is rolled back. The ItemsInStock is
reverted to the original value of 10, but Transaction 2 is working with a different value (i.e
10).
44
Transaction 1 :
Begin Tran
Update tblInventory set ItemsInStock = 9 where Id=1
Rollback Transaction
Transaction 2 :
Set Transaction Isolation Level Read Uncommitted
Select * from tblInventory where Id=1
Read Uncommitted transaction isolation level is the only isolation level that has dirty read
side effect. This is the least restrictive of all the isolation levels. When this transaction
isolation level is set, it is possible to read uncommitted or dirty data. Another option to read
dirty data is by using NOLOCK table hint. The query below is equivalent to the query in
Transaction 2.
As you can see in the diagram below there are 2 transactions - Transaction 1 and
Transaction 2. Transaction 1 starts first, and it is processing an order for 1 iPhone. It sees
ItemsInStock as 10.
At this time Transaction 2 is processing another order for 2 iPhones. It also sees
ItemsInStock as 10. Transaction 2 makes the sale first and updates ItemsInStock with a
value of 8.
At this point Transaction 1 completes the sale and silently overwrites the update of
Transaction 2. As Transaction 1 sold 1 iPhone it has updated ItemsInStock to 9, while it
actually should have updated it to 7.
Example : The lost update problem example. Open 2 instances of SQL Server
Management studio. From the first window execute Transaction 1 code and from the
second window, execute Transaction 2 code. Transaction 1 is processing an order for 1
46
iPhone, while Transaction 2 is processing an order for 2 iPhones. At the end of both the
transactions ItemsInStock must be 7, but we have a value of 9. This is because Transaction
1 silently overwrites the update of Transaction 2. This is called the lost update problem.
-- Transaction 1
Begin Tran
Declare @ItemsInStock int
Update tblInventory
Set ItemsInStock = @ItemsInStock where Id=1
Print @ItemsInStock
Commit Transaction
-- Transaction 2
Begin Tran
Declare @ItemsInStock int
Update tblInventory
Set ItemsInStock = @ItemsInStock where Id=1
Print @ItemsInStock
Commit Transaction
Both Read Uncommitted and Read Committed transaction isolation levels have the lost
update side effect. Repeatable Read, Snapshot, and Serializable isolation levels does not
have this side effect. If you run the above Transactions using any of the higher isolation
levels (Repeatable Read, Snapshot, or Serializable) you will not have lost update problem.
The repeatable read isolation level uses additional locking on rows that are read by the
current transaction, and prevents them from being updated or deleted elsewhere. This
solves the lost update problem.
47
For both the above transactions, set Repeatable Read Isolation Level. Run Transaction 1
first and then a few seconds later run Transaction 2. Transaction 1 completes successfully,
but Transaction 2 competes with the following error.
Transaction was deadlocked on lock resources with another process and has been chosen
as the deadlock victim. Rerun the transaction.
The following diagram explains the problem : Transaction 1 starts first. Reads
ItemsInStock. Gets a value of 10 for first read. Transaction 1 is doing some work and at this
point Transaction 2 starts and UpdatesItemsInStock to 5. Transaction 1 then makes a
second read. At this point Transaction 1 gets a value of 5, reulting in non-repeatable read
problem.
48
-- Transaction 1
Begin Transaction
Select ItemsInStock from tblInventory where Id = 1
-- Do Some work
waitfor delay '00:00:10'
-- Transaction 2
Update tblInventory set ItemsInStock = 5 where Id = 1
Repeatable read or any other higher isolation level should solve the non-repeatable read
problem.
Fixing non repeatable read concurrency problem : To fix the non-repeatable read
problem, set transaction isolation level of Transaction 1 to repeatable read. This will ensure
that the data that Transaction 1 has read, will be prevented from being updated or deleted
elsewhere. This solves the non-repeatable read problem.
49
When you execute Transaction 1 and 2 from 2 different instances of SQL Server
management studio, Transaction 2 is blocked until Transaction 1 completes and at the end
of Transaction 1, both the reads get the same value for ItemsInStock.
-- Transaction 1
Set transaction isolation level repeatable read
Begin Transaction
Select ItemsInStock from tblInventory where Id = 1
-- Do Some work
waitfor delay '00:00:10'
-- Transaction 2
Update tblInventory set ItemsInStock = 5 where Id = 1
Phantom read happens when one transaction executes a query twice and it gets a different
number of rows in the result set each time. This happens when a second transaction inserts
a new row that matches the WHERE clause of the query executed by the first transaction.
The following diagram explains the problem : Transaction 1 starts first. Reads from Emp
table where Id between 1 and 3. 2 rows retrieved for first read. Transaction 1 is doing some
work and at this point Transaction 2 starts and inserts a new employee with Id = 2.
Transaction 1 then makes a second read. 3 rows retrieved for second read, reulting in
phantom read problem.
Phantom read example : Open 2 instances of SQL Server Management studio. From the
first window execute Transaction 1 code and from the second window, execute Transaction
2 code. Notice that when Transaction 1 completes, it gets different number of rows for read
1 and read 2, resulting in phantom read.
-- Transaction 1
Begin Transaction
Select * from tblEmployees where Id between 1 and 3
-- Do Some work
waitfor delay '00:00:10'
Select * from tblEmployees where Id between 1 and 3
Commit Transaction
-- Transaction 2
Insert into tblEmployees values(2, 'Marcus')
Serializable or any other higher isolation level should solve the phantom read problem.
51
Fixing phantom read concurrency problem : To fix the phantom read problem, set
transaction isolation level of Transaction 1 to serializable. This will place a range lock on the
rows between 1 and 3, which prevents any other transaction from inserting new rows with in
that range. This solves the phantom read problem.
When you execute Transaction 1 and 2 from 2 different instances of SQL Server
management studio, Transaction 2 is blocked until Transaction 1 completes and at the end
of Transaction 1, both the reads get the same number of rows.
-- Transaction 1
Set transaction isolation level serializable
Begin Transaction
Select * from tblEmployees where Id between 1 and 3
-- Do Some work
waitfor delay '00:00:10'
Select * from tblEmployees where Id between 1 and 3
Commit Transaction
-- Transaction 2
In this video we will discuss, snapshot isolation level in sql server with examples.
As you can see from the table below, just like serializable isolation level, snapshot isolation
level does not have any concurrency side effects.
Let us understand Snapshot isolation with an example. We will be using the following table
tblInventory for this example.
Open 2 instances of SQL Server Management studio. From the first window execute
Transaction 1 code and from the second window execute Transaction 2 code. Notice that
Transaction 2 is blocked until Transaction 1 is completed.
--Transaction 1
Set transaction isolation level serializable
Begin Transaction
Update tblInventory set ItemsInStock = 5 where Id = 1
waitfor delay '00:00:10'
Commit Transaction
-- Transaction 2
Set transaction isolation level serializable
Select ItemsInStock from tblInventory where Id = 1
Now change the isolation level of Transaction 2 to snapshot. To set snapshot isolation level,
it must first be enabled at the database level, and then set the transaction isolation level to
snapshot.
-- Transaction 2
53
From the first window execute Transaction 1 code and from the second window, execute
Transaction 2 code. Notice that Transaction 2 is not blocked and returns the data from the
database as it was before Transaction 1 has started.
Modifying data with snapshot isolation level : Now let's look at an example of what
happens when a transaction that is using snapshot isolation tries to update the same data
that another transaction is updating at the same time.
Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot
isolation to access table 'dbo.tblInventory' directly or indirectly in database 'SampleDB' to
update, delete, or insert the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the update/delete statement.
--Transaction 1
Set transaction isolation level serializable
Begin Transaction
Update tblInventory set ItemsInStock = 5 where Id = 1
waitfor delay '00:00:10'
Commit Transaction
-- Transaction 2
-- Enable snapshot isloation for the database
Alter database SampleDB SET ALLOW_SNAPSHOT_ISOLATION ON
-- Set the transaction isolation level to snapshot
Set transaction isolation level snapshot
Update tblInventory set ItemsInStock = 8 where Id = 1
Read committed snapshot isolation level is not a different isolation level. It is a different way
of implementing Read committed isolation level. One problem we have with Read
Committed isloation level is that, it blocks the transaction if it is trying to read the data, that
another transaction is updating at the same time.
The following example demonstrates the above point. Open 2 instances of SQL Server
Management studio. From the first window execute Transaction 1 code and from the
second window execute Transaction 2 code. Notice that Transaction 2 is blocked until
Transaction 1 is completed.
--Transaction 1
Set transaction isolation level Read Committed
Begin Transaction
Update tblInventory set ItemsInStock = 5 where Id = 1
waitfor delay '00:00:10'
Commit Transaction
-- Transaction 2
Set transaction isolation level read committed
Begin Transaction
Select ItemsInStock from tblInventory where Id = 1
Commit Transaction
We can make Transaction 2 to use row versioning technique instead of locks by enabling
Read committed snapshot isolation at the database level. Use the following command to
enable READ_COMMITTED_SNAPSHOT isolation
Alter database SampleDB SET READ_COMMITTED_SNAPSHOT ON
Please note : For the above statement to execute successfully all the other database
connections should be closed.
Let's see if we can achieve the same thing using snapshot isolation level instead of read
committed snapshot isolation level.
Step 3 : Execute Transaction 1 first and then Transaction 2 simultaneously. Just like in the
previous example, notice that the Transaction 2 is not blocked. It immediately returns the
committed data that is in the database before Transaction 1 started.
--Transaction 1
Set transaction isolation level Read Committed
Begin Transaction
Update tblInventory set ItemsInStock = 5 where Id = 1
waitfor delay '00:00:10'
Commit Transaction
-- Transaction 2
Set transaction isolation level snapshot
Begin Transaction
Select ItemsInStock from tblInventory where Id = 1
Commit Transaction
So what is the point in using read committed snapshot isolation level over snapshot
isolation level?
There are some differences between read committed snapshot isolation level and snapshot
isolation level. We will discuss these in our next video.
56
Enable Snapshot Isolation for the SampleDB database using the following command
Alter database SampleDB SET ALLOW_SNAPSHOT_ISOLATION ON
Open 2 instances of SQL Server Management studio. From the first window execute
Transaction 1 code and from the second window execute Transaction 2 code. Notice that
Transaction 2 is blocked until Transaction 1 is completed. When Transaction 1 completes,
Transaction 2 raises an update conflict and the transaction terminates and rolls back with
an error.
--Transaction 1
Set transaction isolation level snapshot
57
Begin Transaction
Update tblInventory set ItemsInStock = 8 where Id = 1
waitfor delay '00:00:10'
Commit Transaction
-- Transaction 2
Set transaction isolation level snapshot
Begin Transaction
Update tblInventory set ItemsInStock = 5 where Id = 1
Commit Transaction
Now let's try the same thing using Read Committed Sanpshot Isolation
Step 1 : Disable Snapshot Isolation for the SampleDB database using the following
command
Alter database SampleDB SET ALLOW_SNAPSHOT_ISOLATION OFF
Step 2 : Enable Read Committed Sanpshot Isolation at the database level using the
following command
Alter database SampleDB SET READ_COMMITTED_SNAPSHOT ON
Step 3 : Open 2 instances of SQL Server Management studio. From the first window
execute Transaction 1 code and from the second window execute Transaction 2 code.
Notice that Transaction 2 is blocked until Transaction 1 is completed. When Transaction 1
completes, Transaction 2 also completes successfully without any update conflict.
--Transaction 1
Set transaction isolation level read committed
Begin Transaction
Update tblInventory set ItemsInStock = 8 where Id = 1
waitfor delay '00:00:10'
Commit Transaction
-- Transaction 2
58
Existing application : If your application is using the default Read Committed isolation
level, you can very easily make the application to use Read Committed Snapshot Isolation
without requiring any change to the application at all. All you need to do is turn on
READ_COMMITTED_SNAPSHOT option in the database, which will change read
committed isolation to use row versioning when reading the committed data.
Transaction 2 has 2 select statements. Notice that both of these select statements return
different data. This is because Read Committed Snapshot Isolation returns the last
committed data before the select statement began and not the last committed data before
the transaction began.
59
In the following example, both the select statements of Transaction 2 return same data. This
is because Snapshot Isolation returns the last committed data before the transaction began
60
and not the last committed data before the select statement began.
61
When deadlocks occur, SQL Server will choose one of processes as the deadlock victim
and rollback that process, so the other process can move forward. The transaction that is
chosen as the deadlock victim will produce the following error.
Msg 1205, Level 13, State 51, Line 1
Transaction (Process ID 57) was deadlocked on lock resources with another process and
has been chosen as the deadlock victim. Rerun the transaction.
Let us look at this in action. We will use the following 2 tables for this example.
SQL script to create the tables and populate them with test data
Create table TableA
(
Id int identity primary key,
Name nvarchar(50)
)
62
Go
Go
The following 2 transactions will result in a dead lock. Open 2 instances of SQL Server
Management studio. From the first window execute Transaction 1 code and from the
second window execute Transaction 2 code.
-- Transaction 1
Begin Tran
Update TableA Set Name = 'Mark Transaction 1' where Id = 1
-- Transaction 2
Begin Tran
Update TableB Set Name = 'Mark Transaction 2' where Id = 1
Commit Transaction
Next Video : We will discuss the criteria SQL Server uses to choose a deadlock victim
63
What is DEADLOCK_PRIORITY
By default, SQL Server chooses a transaction as the deadlock victim that is least expensive
to roll back. However, a user can specify the priority of sessions in a deadlock situation
using the SET DEADLOCK_PRIORITY statement. The session with the lowest deadlock
priority is chosen as the deadlock victim.
DEADLOCK_PRIORITY
1. The default is Normal
2. Can be set to LOW, NORMAL, or HIGH
3. Can also be set to a integer value in the range of -10 to 10.
LOW : -5
NORMAL : 0
HIGH : 5
as the victim
2. If both the sessions have the same priority, the transaction that is least expensive to
rollback is selected as the victim
3. If both the sessions have the same deadlock priority and the same cost, a victim is
chosen randomly
Open 2 instances of SQL Server Management studio. From the first window execute
65
Transaction 1 code and from the second window execute Transaction 2 code. We have not
explicitly set DEADLOCK_PRIORITY, so both the sessions have the default
DEADLOCK_PRIORITY which is NORMAL. So in this case SQL Server is going to choose
Transaction 2 as the deadlock victim as it is the least expensive one to rollback.
-- Transaction 1
Begin Tran
Update TableA Set Name = Name + ' Transaction 1' where Id IN (1, 2, 3, 4, 5)
-- Transaction 2
Begin Tran
Update TableB Set Name = Name + ' Transaction 2' where Id = 1
Update TableA Set Name = Name + ' Transaction 2' where Id IN (1, 2, 3, 4, 5)
-- After a few seconds notice that this transaction will be chosen as the deadlock
-- victim as it is less expensive to rollback this transaction than Transaction 1
Commit Transaction
-- Transaction 1
Begin Tran
Update TableA Set Name = Name + ' Transaction 1' where Id IN (1, 2, 3, 4, 5)
-- Transaction 2
SET DEADLOCK_PRIORITY HIGH
GO
Begin Tran
Update TableB Set Name = Name + ' Transaction 2' where Id = 1
Update TableA Set Name = Name + ' Transaction 2' where Id IN (1, 2, 3, 4, 5)
In this video we will discuss how to write the deadlock information to the SQL Server
error log
When deadlocks occur, SQL Server chooses one of the transactions as the deadlock
victim and rolls it back. There are several ways in SQL Server to track down the queries that
are causing deadlocks. One of the options is to use SQL Server trace flag 1222 to write the
deadlock information to the SQL Server error log.
Enable Trace flag : To enable trace flags use DBCC command. -1 parameter indicates that
the trace flag must be set at the global level. If you omit -1 parameter the trace flag will be
set only at the session level.
The following SQL code generates a dead lock. This is the same code we discussed in Part
78 of SQL Server Tutorial.
--SQL script to create the tables and populate them with test data
Create table TableA
(
Id int identity primary key,
Name nvarchar(50)
)
Go
)
Go
Open 2 instances of SQL Server Management studio. From the first window execute
spTransaction1 and from the second window execute spTransaction2.
69
After a few seconds notice that one of the transactions complete successfully while the
other transaction is made the deadlock victim and rollback.
The information about this deadlock should now have been logged in sql server error log.
Next video : How to read and understand the deadlock information that is logged in the sql
server error log
Section Description
Deadlock Contains the ID of the process that was selected as the deadlock victim and
Victim killed by SQL Server.
Process List Contains the list of the processes that participated in the deadlock.
Process List : The process list has lot of items. Here are some of them that are particularly
useful in understanding what caused the deadlock.
Node Description
Inputbuf The code the process is executing when the deadlock occured
Resource List : Some of the items in the resource list that are particularly useful in
understanding what caused the deadlock.
70
Node Description
Contains (owner id) the id of the owning process and the lock mode it has
acquired on the resource. lock mode determines how the resource can be
owner-list
accessed by concurrent transactions. S for Shared lock, U for Update lock, X
for Exclusive lock etc
Contains (waiter id) the id of the process that wants to acquire a lock on the
waiter-list
resource and the lock mode it is requesting
To prevent the deadlock that we have in our case, we need to ensure that database objects
(Table A & Table B) are accessed in the same order every time
To capture deadlock graph, all you need to do is add Deadlock graph event to the trace in
SQL profiler.
4. On the "Events Selection" tab, expand "Locks" section and select "Deadlock graph"
event
71
The deadlock graph data is captured in XML format. If you want to extract this XML data
to a physical file for later analysis, you can do so by following the steps below.
1. In SQL profiler, click on "File - Export - Extract SQL Server Events - Extract Deadlock
Events"
2. Provide a name for the file
3. The extension for the deadlock xml file is .xdl
4. Finally choose if you want to export all events in a single file or each event in a separate
72
file
The deadlock information in the XML file is similar to what we have captured using the trace
flag 1222.
• Server Process Id : If you are using SQL Server Management Studio you can see the
server process id on information bar at the bottom.
• Deadlock Priority : If you have not set DEADLOCK PRIORITY explicitly using SET
DEADLOCK PRIORITY statement, then both the processes should have the same default
deadlock priority NORMAL (0).
• Log Used : The transaction log space used. If a transaction has used a lot of log space
then the cost to roll it back is also more. So the transaction that has used the least log
space is killed and rolled back.
• HoBt ID : Heap Or Binary Tree ID. Using this ID query sys.partitions view to find the
database objects involved in the deadlock.
SELECT object_name([object_id])
FROM sys.partitions
6. The arrows represent types of locks each process has on each resource node
Modify the stored procedure as shown below to catch the deadlock error. The code is
commented and is self-explanatory.
Begin Try
Update TableA Set Name = 'Mark Transaction 1' where Id = 1
Waitfor delay '00:00:05'
Update TableB Set Name = 'Mary Transaction 1' where Id = 1
-- If both the update statements succeeded.
-- No Deadlock occurred. So commit the transaction.
Commit Transaction
Select 'Transaction Successful'
End Try
Begin Catch
-- Check if the error is deadlock error
If(ERROR_NUMBER() = 1205)
Begin
Select 'Deadlock. Transaction failed. Please retry'
End
-- Rollback the transaction
Rollback
End Catch
End
After modifying the stored procedures, execute both the procedures from 2 different
windows simultaneously. Notice that the deadlock error is handled by the catch block.
In our next video, we will discuss how applications using ADO.NET can handle
deadlock errors.
74
http://csharp-video-tutorials.blogspot.in/2015/08/retry-logic-for-deadlock-exceptions.html
Blocking occurs if there are open transactions. Let us understand this with an example.
Now from a different window, execute any of the following commands. Notice that all the
queries are blocked.
Select Count(*) from TableA
Delete from TableA where Id = 1
Truncate table TableA
Drop table TableA
This is because there is an open transaction. Once the open transaction completes, you will
be able to execute the above queries.
So the obvious next question is - How to identify all the active transactions.
One way to do this is by using DBCC OpenTran. DBCC OpenTran will display only the
oldest active transaction. It is not going to show you all the open transactions.
DBCC OpenTran
The following link has the SQL script that you can use to identify all the active transactions.
http://www.sqlskills.com/blogs/paul/script-open-transactions-with-text-and-plans
The beauty about this script is that it has a lot more useful information about the open
transactions
Session Id
Login Name
Database Name
Transaction Begin Time
The actual query that is executed
75
You can now use this information and ask the respective developer to either commit or
rollback the transactions that they have left open unintentionally.
For some reason if the person who initiated the transaction is not available, you also have
the option to KILL the associated process. However, this may have unintended
consequences, so use it with extreme caution.
EXCEPT operator returns unique rows from the left query that aren’t in the right query’s
results.
Let us understand this with an example. We will use the following 2 tables for this example.
76
Notice that the following query returns the unique rows from the left query that aren’t in the
right query’s results.
Select Id, Name, Gender
From TableA
Except
Select Id, Name, Gender
77
From TableB
Result :
To retrieve all of the rows from Table B that does not exist in Table A, reverse the two
queries as shown below.
Select Id, Name, Gender
From TableB
Except
Select Id, Name, Gender
From TableA
Result :
You can also use Except operator on a single table. Let's use the following tblEmployees
table for this example.
78
Result :
79
Order By clause should be used only once after the right query
Select Id, Name, Gender, Salary
From tblEmployees
Where Salary >= 50000
Except
Select Id, Name, Gender, Salary
From tblEmployees
Where Salary >= 60000
order By Name
The following query returns the rows from the left query that aren’t in the right query’s
results.
Result :
80
Now execute the following EXCEPT query. Notice that we get only the DISTINCT rows
Select Id, Name, Gender From TableA
Except
Select Id, Name, Gender From TableB
Result:
Now execute the following query. Notice that the duplicate rows are not filtered.
Select Id, Name, Gender From TableA
Where Id NOT IN (Select Id from TableB)
Result:
2. EXCEPT operator expects the same number of columns in both the queries, where as
NOT IN, compares a single column from the outer query with a single column from the
subquery.
NOT IN, compares a single column from the outer query with a single column from
subquery.
Intersect operator retrieves the common records from both the left and the right
query of the Intersect operator.
SQL Script to create the tables and populate with test data
Create Table TableA
(
Id int,
Name nvarchar(50),
Gender nvarchar(10)
)
Go
82
The following query retrieves the common records from both the left and the right query of
the Intersect operator.
Result :
We can also achieve the same thinkg using INNER join. The following INNER join query
would produce the exact same result.
Now execute the following INTERSECT query. Notice that we get only the DISTINCT rows
83
Result :
Now execute the following INNER JOIN query. Notice that the duplicate rows are not
filtered.
Result :
You can make the INNER JOIN behave like INTERSECT operator by using the DISTINCT
operator
Result :
2. INNER JOIN treats two NULLS as two different values. So if you are joining two tables
based on a nullable column and if both tables have NULLs in that joining column then,
INNER JOIN will not include those rows in the result-set, where as INTERSECT treats two
NULLs as a same value and it returns all matching rows.
84
INTERSECT query
Select Id, Name, Gender from TableA
Intersect
Select Id, Name, Gender from TableB
Result :
Result :
UNION operator returns all the unique rows from both the left and the right query. UNION
ALL included the duplicates as well.
INTERSECT operator retrieves the common unique rows from both the left and the right
query.
EXCEPT operator returns unique rows from the left query that aren’t in the right query’s
results.
Let us understand these differences with examples. We will use the following 2 tables for
the examples.
86
UNION operator returns all the unique rows from both the queries. Notice the duplicates are
removed.
Result :
87
UNION ALL operator returns all the rows from both the queries, including the duplicates.
Result :
INTERSECT operator retrieves the common unique rows from both the left and the right
query. Notice the duplicates are removed.
Result :
EXCEPT operator returns unique rows from the left query that aren’t in the right query’s
results.
88
Result :
If you wnat the rows that are present in Table B but not in Table A, reverse the queries.
Result :
For all these 3 operators to work the following 2 conditions must be met
• The number and the order of the columns must be same in both the queries
• The data types must be same or at least compatible
For example, if the number of columns are different, you will get the following error
Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an
equal number of expressions in their target lists
SQL Script to create the tables and populate with test data
Create table Department
(
Id int primary key,
DepartmentName nvarchar(50)
)
Go
We want to retrieve all the matching rows between Department and Employee tables.
90
This can be very easily achieved using an Inner Join as shown below.
Select D.DepartmentName, E.Name, E.Gender, E.Salary
from Department D
Inner Join Employee E
On D.Id = E.DepartmentId
Now if we want to retrieve all the matching rows between Department and Employee
tables + the non-matching rows from the LEFT table (Department)
This can be very easily achieved using a Left Join as shown below.
Select D.DepartmentName, E.Name, E.Gender, E.Salary
from Department D
Left Join Employee E
On D.Id = E.DepartmentId
Now let's assume we do not have access to the Employee table. Instead we have access to
the following Table Valued function, that returns all employees belonging to a department
by Department Id.
)
Go
The following query returns the employees of the department with Id =1.
Select * from fn_GetEmployeesByDepartmentId(1)
Now if you try to perform an Inner or Left join between Department table and
fn_GetEmployeesByDepartmentId() function you will get an error.
If you execute the above query you will get the following error
Msg 4104, Level 16, State 1, Line 3
The multi-part identifier "D.Id" could not be bound.
This is where we use Cross Apply and Outer Apply operators. Cross Apply is
semantically equivalent to Inner Join and Outer Apply is semantically equivalent to Left
Outer Join.
Just like Inner Join, Cross Apply retrieves only the matching rows from the Department
table and fn_GetEmployeesByDepartmentId() table valued function.
Just like Left Outer Join, Outer Apply retrieves all matching rows from the Department table
and fn_GetEmployeesByDepartmentId() table valued function + non-matching rows from
the left table (Department)
• The APPLY operator introduced in SQL Server 2005, is used to join a table to a table-
valued function.
• The Table Valued Function on the right hand side of the APPLY operator gets called for
each row from the left (also called outer table) table.
• Cross Apply returns only matching rows (semantically equivalent to Inner Join)
• Outer Apply returns matching + non-matching rows (semantically equivalent to Left Outer
Join). The unmatched columns of the table valued function will be set to NULL
92
Certain system stored procedures that perform DDL-like operations can also fire DDL
triggers. Example - sp_rename system stored procedure
DDL triggers scope : DDL triggers can be created in a specific database or at the server
level.
ON Database
FOR CREATE_TABLE
AS
BEGIN
Print 'New table created'
END
1. In the Object Explorer window, expand the SampleDB database by clicking on the
plus symbol.
2. Expand Programmability folder
3. Expand Database Triggers folder
Please note : If you can't find the trigger that you just created, make sure to refresh the
Database Triggers folder.
When you execute the following code to create the table, the trigger will automatically fire
and will print the message - New table created
Create Table Test (Id int)
The above trigger will be fired only for one DDL event CREATE_TABLE. If you want this
trigger to be fired for multiple events, for example when you alter or drop a table, then
separate the events using a comma as shown below.
ON Database
FOR CREATE_TABLE, ALTER_TABLE, DROP_TABLE
AS
BEGIN
Print 'A table has just been created, modified or deleted'
END
Now if you create, alter or drop a table, the trigger will fire automatically and you will get the
message - A table has just been created, modified or deleted.
The 2 DDL triggers above execute some code in response to DDL events
Now let us look at an example of how to prevent users from creating, altering or dropping
tables. To do this modify the trigger as shown below.
To be able to create, alter or drop a table, you either have to disable or delete the trigger.
To disable trigger
1. Right click on the trigger in object explorer and select "Disable" from the context menu
2. You can also disable the trigger using the following T-SQL command
DISABLE TRIGGER trMyFirstTrigger ON DATABASE
To enable trigger
1. Right click on the trigger in object explorer and select "Enable" from the context menu
2. You can also enable the trigger using the following T-SQL command
ENABLE TRIGGER trMyFirstTrigger ON DATABASE
To drop trigger
95
1. Right click on the trigger in object explorer and select "Delete" from the context menu
2. You can also drop the trigger using the following T-SQL command
DROP TRIGGER trMyFirstTrigger ON DATABASE
Certain system stored procedures that perform DDL-like operations can also fire DDL
triggers. The following trigger will be fired when ever you rename a database object using
sp_rename system stored procedure.
The following code changes the name of the TestTable to NewTestTable. When this code is
executed, it will fire the trigger trRenameTable
sp_rename 'TestTable', 'NewTestTable'
The following code changes the name of the Id column in NewTestTable to NewId. When
this code is executed, it will fire the trigger trRenameTable
sp_rename 'NewTestTable.Id' , 'NewId', 'column'
The following trigger is a database scoped trigger. This will prevent users from creating,
altering or dropping tables only from the database in which it is created.
If you have another database on the server, they will be able to create, alter or drop tables
in that database. If you want to prevent users from doing this you may create the trigger
again in this database.
But, what if you have 100 different databases on your SQL Server, and you want to
prevent users from creating, altering or dropping tables from all these 100 databases.
Creating the same trigger for all the 100 different databases is not a good approach for 2
reasons.
1. It is tedious and error prone
2. Maintainability is a night mare. If for some reason you have to change the trigger, you will
have to do it in 100 different databases, which again is tedious and error prone.
This is where server-scoped DDL triggers come in handy. When you create a server scoped
DDL trigger, it will fire in response to the DDL events happening in all of the databases on
that server.
Now if you try to create, alter or drop a table in any of the databases on the server, the
trigger will be fired.
Server scoped triggers will always fire before any of the database scoped triggers.
This execution order cannot be changed.
ON DATABASE
FOR CREATE_TABLE
AS
BEGIN
Print 'Database Scope Trigger'
END
GO
Using the sp_settriggerorder stored procedure, you can set the execution order of server-
scoped or database-scoped triggers.
EXEC sp_settriggerorder
@triggername = 'tr_DatabaseScopeTrigger1',
@order = 'none',
@stmttype = 'CREATE_TABLE',
@namespace = 'DATABASE'
GO
If you have a database-scoped and a server-scoped trigger handling the same event,
and if you have set the execution order at both the levels. Here is the execution order of the
triggers.
1. The server-scope trigger marked First
2. Other server-scope triggers
3. The server-scope trigger marked Last
4. The database-scope trigger marked First
5. Other database-scope triggers
6. The database-scope trigger marked Last
99
The following trigger audits all table changes in all databases on a SQL Server
CREATE TRIGGER tr_AuditTableChanges
ON ALL SERVER
FOR CREATE_TABLE, ALTER_TABLE, DROP_TABLE
AS
BEGIN
DECLARE @EventData XML
SELECT @EventData = EVENTDATA()
In the above example we are using EventData() function which returns event data in XML
format. The following XML is returned by the EventData() function when I created a table
with name = MyTable in SampleDB database.
<EVENT_INSTANCE>
<EventType>CREATE_TABLE</EventType>
<PostTime>2015-09-11T16:12:49.417</PostTime>
<SPID>58</SPID>
100
<ServerName>VENKAT-PC</ServerName>
<LoginName>VENKAT-PC\Tan</LoginName>
<UserName>dbo</UserName>
<DatabaseName>SampleDB</DatabaseName>
<SchemaName>dbo</SchemaName>
<ObjectName>MyTable</ObjectName>
<ObjectType>TABLE</ObjectType>
<TSQLCommand>
<SetOptions ANSI_NULLS="ON" ANSI_NULL_DEFAULT="ON"
ANSI_PADDING="ON" QUOTED_IDENTIFIER="ON"
ENCRYPTED="FALSE" />
<CommandText>
Create Table MyTable
(
Id int,
Name nvarchar(50),
Gender nvarchar(50)
)
</CommandText>
</TSQLCommand>
</EVENT_INSTANCE>
As the name implies Logon triggers fire in response to a LOGON event. Logon triggers
fire after the authentication phase of logging in finishes, but before the user session is
actually established.
Logon trigger example : The following trigger limits the maximum number of open
connections for a user to 3.
The trigger error message will be written to the error log. Execute the following command to
read the error log.
Execute sp_readerrorlog
The SELECT INTO statement in SQL Server, selects data from one table and inserts it
into a new table.
2. Copy all rows and columns from an existing table into a new table in an external
database.
SELECT * INTO HRDB.dbo.EmployeesBackup FROM Employees
6. Create a new table whose columns and datatypes match with an existing table.
SELECT * INTO EmployeesBackup FROM Employees WHERE 1 <> 1
7. Copy all rows and columns from an existing table into a new table on a different SQL
Server instance. For this, create a linked server and use the 4 part naming convention
SELECT * INTO TargetTable
FROM [SourceServer].[SourceDB].[dbo].[SourceTable]
Please note : You cannot use SELECT INTO statement to select data into an existing
table. For this you will have to use INSERT INTO statement.
Let us understand the difference with an example. For the examples in this video we will
use the following Sales table.
104
SQL Script to create and populate Sales table with test data
Create table Sales
(
Product nvarchar(50),
SaleAmount int
)
Go
To calculate total sales by product, we would write a GROUP BY query as shown below
SELECT Product, SUM(SaleAmount) AS TotalSales
FROM Sales
GROUP BY Product
Now if we want to find only those products where the total sales amount is greater than
$1000, we will use HAVING clause to filter products
SELECT Product, SUM(SaleAmount) AS TotalSales
FROM Sales
GROUP BY Product
HAVING SUM(SaleAmount) > 1000
Result :
105
If we use WHERE clause instead of HAVING clause, we will get a syntax error. This is
because the WHERE clause doesn’t work with aggregate functions like sum, min, max, avg,
etc.
SELECT Product, SUM(SaleAmount) AS TotalSales
FROM Sales
GROUP BY Product
WHERE SUM(SaleAmount) > 1000
So in short, the difference is WHERE clause cannot be used with aggregates where as
HAVING can.
However, there are other differences as well that we need to keep in mind when using
WHERE and HAVING clauses. WHERE clause filters rows before aggregate calculations
are performed where as HAVING clause filters rows after aggregate calculations are
performed. Let us understand this with an example.
Total sales of iPhone and Speakers can be calculated by using either WHERE or HAVING
clause
Calculate Total sales of iPhone and Speakers using WHERE clause : In this example
the WHERE clause retrieves only iPhone and Speaker products and then performs the
sum.
SELECT Product, SUM(SaleAmount) AS TotalSales
FROM Sales
WHERE Product in ('iPhone', 'Speakers')
GROUP BY Product
Result :
Calculate Total sales of iPhone and Speakers using HAVING clause : This example
retrieves all rows from Sales table, performs the sum and then removes all products except
iPhone and Speakers.
SELECT Product, SUM(SaleAmount) AS TotalSales
FROM Sales
GROUP BY Product
HAVING Product in ('iPhone', 'Speakers')
Result :
106
So from a performance standpoint, HAVING is slower than WHERE and should be avoided
when possible.
Another difference is WHERE comes before GROUP BY and HAVING comes after GROUP
BY.
2. WHERE comes before GROUP BY. This means WHERE clause filters rows before
aggregate calculations are performed. HAVING comes after GROUP BY. This means
HAVING clause filters rows after aggregate calculations are performed. So from a
performance standpoint, HAVING is slower than WHERE and should be avoided when
possible.
3. WHERE and HAVING can be used together in a SELECT query. In this case WHERE
clause is applied first to filter individual rows. The rows are then grouped and aggregate
calculations are performed, and then the HAVING clause filters the groups
Table Valued Parameter is a new feature introduced in SQL SERVER 2008. Table Valued
Parameter allows a table (i.e multiple rows of data) to be passed as a parameter to a stored
procedure from T-SQL code or from an application. Prior to SQL SERVER 2008, it is not
possible to pass a table variable as a parameter to a stored procedure.
Let us understand how to pass multiple rows to a stored procedure using Table Valued
Parameter with an example. We want to insert multiple rows into the following Employees
table. At the moment this table does not have any rows.
Name nvarchar(50),
Gender nvarchar(10)
)
Go
Step 2 : Use the User-defined Table Type as a parameter in the stored procedure. Table
valued parameters must be passed as read-only to stored procedures, functions etc. This
means you cannot perform DML operations like INSERT, UPDATE or DELETE on a table-
valued parameter in the body of a function, stored procedure etc.
Step 3 : Declare a table variable, insert the data and then pass the table variable as a
parameter to the stored procedure.
That's it. Now select the data from Employees table and notice that all the rows of the table
variable are inserted into the Employees table.
108
In our next video, we will discuss how to pass table as a parameter to the stored
procedure from an ADO.NET application
We will be using the following Employees table for the examples in this video.
Gender nvarchar(10),
Salary int,
Country nvarchar(10)
)
Go
We want to calculate Sum of Salary by Country and Gender. The result should be as
shown below.
We can very easily achieve this using a Group By query as shown below
Select Country, Gender, Sum(Salary) as TotalSalary
From Employees
Group By Country, Gender
Within the same result set we also want Sum of Salary just by Country. The Result should
be as shown below. Notice that Gender column within the resultset is NULL as we are
grouping only by Country column
110
To achieve the above result we could combine 2 Group By queries using UNION ALL as
shown below.
UNION ALL
Within the same result set we also want Sum of Salary just by Gender. The Result should
be as shown below. Notice that the Country column within the resultset is NULL as we are
grouping only by Gender column.
111
We can achieve this by combining 3 Group By queries using UNION ALL as shown below
UNION ALL
UNION ALL
Finally we also want the grand total of Salary. In this case we are not grouping on any
particular column. So both Country and Gender columns will be NULL in the resultset.
112
To achieve this we will have to combine the fourth query using UNION ALL as shown
below.
UNION ALL
UNION ALL
UNION ALL
If we use Grouping Sets feature introduced in SQL Server 2008, the amount of T-SQL
code that you have to write will be greatly reduced. The following Grouping Sets query
produce the same result as the above UNION ALL query.
The order of the rows in the result set is not the same as in the case of UNION ALL query.
To control the order use order by as shown below.
() -- Grand Total
)
Order By Grouping(Country), Grouping(Gender), Gender
Let us understand Rollup in SQL Server with examples. We will use the following
Employees table for the examples in this video.
115
There are several ways to achieve this. The easiest way is by using Rollup with Group By.
SELECT Country, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY ROLLUP(Country)
UNION ALL
Group Salary by Country and Gender. Also compute the Subtotal for Country level and
Grand Total as shown below.
116
--OR
SELECT Country, Gender, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY Country, Gender WITH ROLLUP
UNION ALL
SELECT Country, NULL, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY Country
UNION ALL
SELECT NULL, NULL, SUM(Salary) AS TotalSalary
FROM Employees
Let us understand Cube() in SQL Server with examples. We will use the following
Employees table for the examples in this video.
Write a query to retrieve Sum of Salary grouped by all combinations of the following 2
columns as well as Grand Total.
Country,
Gender
--OR
The above query is equivalent to the following UNION ALL query. While the data in the
result set is the same, the ordering is not. Use ORDER BY to control the ordering of rows in
the result set.
UNION ALL
UNION ALL
UNION ALL
Let us understand this difference with an example. Consider the following Sales table.
ROLLUP(Continent, Country, City) produces Sum of Salary for the following hierarchy
Continent, Country, City
Continent, Country,
Continent
()
120
CUBE(Continent, Country, City) produces Sum of Salary for all the following column
combinations
Continent, Country, City
Continent, Country,
Continent, City
Continent
Country, City
Country,
City
()
You won't see any difference when you use ROLLUP and CUBE on a single column. Both
the following queries produces the same output.
-- OR
This is continuation to Part 104. Please watch Part 104 from SQL Server tutorial before
proceeding. We will use the following Sales table for this example.
Grouping returns 1 for aggregated or 0 for not aggregated in the result set.
The following query returns 1 for aggregated or 0 for not aggregated in the result set
Result :
SELECT
CASE WHEN
GROUPING(Continent) = 1 THEN 'All' ELSE ISNULL(Continent, 'Unknown')
END AS Continent,
CASE WHEN
GROUPING(Country) = 1 THEN 'All' ELSE ISNULL(Country, 'Unknown')
124
END AS Country,
CASE
WHEN GROUPING(City) = 1 THEN 'All' ELSE ISNULL(City, 'Unknown')
END AS City,
SUM(SaleAmount) AS TotalSales
FROM Sales
GROUP BY ROLLUP(Continent, Country, City)
Result :
Well, you can, but only if your data does not contain NULL values. Let me explain what I
mean.
At the moment the raw data in our Sales has no NULL values. Let's introduce a NULL value
in the City column of the row where Id = 1
Result : Notice that the actuall NULL value in the raw data is also replaced with the word
'All', which is incorrect. Hence the need for Grouping function.
Please note : Grouping function can be used with Rollup, Cube and Grouping Sets
126
Syntax : GROUPING function is used on single column, where as the column list for
GROUPING_ID function must match with GROUP BY column list.
GROUPING(Col1)
GROUPING_ID(Col1, Col2, Col3,...)
GROUPING_ID() function concatenates all the GOUPING() functions, perform the binary to
decimal conversion, and returns the equivalent integer. In short
GROUPING_ID(A, B, C) = GROUPING(A) + GROUPING(B) + GROUPING(C)
Let us understand this with an example.
Query result :
127
Row Number 1 : Since the data is not aggregated by any column GROUPING(Continent),
GROUPING(Country) and GROUPING(City) return 0 and as result we get a binar string
with all ZEROS (000). When this converted to decimal we get 0 which is displayed in GPID
column.
Row Number 7 : The data is aggregated for Country and City columns, so
GROUPING(Country) and GROUPING(City) return 1 where as GROUPING(Continent)
return 0. As result we get a binar string (011). When this converted to decimal we get 10
which is displayed in GPID column.
Row Number 15 : This is the Grand total row. Notice in this row the data is aggregated by
all the 3 columns. Hence all the 3 GROUPING functions return 1. So we get a binary string
with all ONES (111). When this converted to decimal we get 7 which is displayed in GPID
column.
Use of GROUPING_ID function : GROUPING_ID function is very handy if you want to sort
and filter by level of grouping.
Result :
Filter by level of grouping : The following query retrieves only continent level aggregated
data
128
Result :
Setting up the Debugger in SSMS : If you have connected to SQL Server using (local) or .
(period), and when you start the debugger you will get the following error
Unable to start T-SQL Debugging. Could not connect to computer.
To fix this error, use the computer name to connect to the SQL Server instead of using
(local) or .
129
For the examples in this video we will be using the following stored procedure.
Create procedure spPrintEvenNumbers
@Target int
as
Begin
Declare @StartNumber int
Set @StartNumber = 1
Connect to SQL Server using your computer name, and then execute the above code to
create the stored procedure. At this point, open a New Query window. Copy and paste the
following T-SQL code to execute the stored procedure.
Starting the Debugger in SSMS : There are 2 ways to start the debugger
1. In SSMS, click on the Debug Menu and select Start Debugging
At this point you should have the debugger running. The line that is about to be executed is
130
Step Over, Step into and Step Out in SSMS : You can find the keyboard shortcuts in the
Debug menu in SSMS.
Let us understand what Step Over, Step into and Step Out does when debugging the
following piece of code
1. There is no difference when you STEP INTO (F11) or STEP OVER (F10) the code on
LINE 2
2. On LINE 3, we are calling a Stored Procedure. On this statement if we press F10 (STEP
OVER), it won't give us the opportunity to debug the stored procedure code. To be able to
debug the stored procedure code you will have to STEP INTO it by pressing F11.
3. If the debugger is in the stored procedure, and you don't want to debug line by line with in
that stored procedure, you can STEP OUT of it by pressing SHIFT + F11. When you do
this, the debugger completes the execution of the stored procedure and waits on the next
line in the main query, i.e on LINE 4 in this example.
131
Show Next Statement shows the next statement that the debugger is about to execute.
Run to Cursor command executes all the statements in a batch up to the current cursor
position
Locals Window in SSMS : Displays the current values of variables and parameters
If you cannot see the locals window or if you have closed it and if you want to open it, you
can do so using the following menu option. Locals window is only available if you are in
DEBUG mode.
132
Watch Window in SSMS : Just like Locals window, Watch window is used to watch the
values of variables. You can add and remove variables from the watch window. To add a
variable to the Watch Window, right click on the variable and select "Add Watch" option
from the context menu.
Call Stack Window in SSMS : Allows you to navigate up and down the call stack to see
what values your application is storing at different levels. It's an invaluable tool for
determining why your code is doing what it's doing.
Immediate Window in SSMS : Very helpful during debugging to evaluate expressions, and
print variable values. To clear immediate window type >cls and press enter.
Enable, Disable or Delete all breakpoints : There are 2 ways to Enable, Disable or Delete
all breakpoints
2. From the Breakpoints window. To view Breakpoints window select Debug => Windows
=> Breakpoints or use the keyboard shortcut ALT + CTRL + B
Conditional Breakpoint : Conditional Breakpoints are hit only when the specified condition
is met. These are extremely useful when you have some kind of a loop and you want to
break, only when the loop variable has a specific value (For example loop varible = 100).
For example :
COUNT(Gender) OVER (PARTITION BY Gender) will partition the data by GENDER i.e
there will 2 partitions (Male and Female) and then the COUNT() function is applied over
each partition.
Any of the following functions can be used. Please note this is not the complete list.
COUNT(), AVG(), SUM(), MIN(), MAX(), ROW_NUMBER(), RANK(), DENSE_RANK() etc.
Example : We will use the following Employees table for the examples in this video.
Write a query to retrieve total count of employees by Gender. Also in the result we want
Average, Minimum and Maximum salary by Gender. The result of the query should be as
shown below.
This can be very easily achieved using a simple GROUP BY query as show below.
SELECT Gender, COUNT(*) AS GenderTotal, AVG(Salary) AS AvgSal,
MIN(Salary) AS MinSal, MAX(Salary) AS MaxSal
FROM Employees
GROUP BY Gender
What if we want non-aggregated values (like employee Name and Salary) in result set
along with aggregated values
One way to achieve this is by including the aggregations in a subquery and then JOINING it
with the main query as shown in the example below. Look at the amount of T-SQL code we
have to write.
SELECT Name, Salary, Employees.Gender, Genders.GenderTotals,
Genders.AvgSal, Genders.MinSal, Genders.MaxSal
136
FROM Employees
INNER JOIN
(SELECT Gender, COUNT(*) AS GenderTotals,
AVG(Salary) AS AvgSal,
MIN(Salary) AS MinSal, MAX(Salary) AS MaxSal
FROM Employees
GROUP BY Gender) AS Genders
ON Genders.Gender = Employees.Gender
Better way of doing this is by using the OVER clause combined with PARTITION BY
SELECT Name, Salary, Gender,
COUNT(Gender) OVER(PARTITION BY Gender) AS GenderTotals,
AVG(Salary) OVER(PARTITION BY Gender) AS AvgSal,
MIN(Salary) OVER(PARTITION BY Gender) AS MinSal,
MAX(Salary) OVER(PARTITION BY Gender) AS MaxSal
FROM Employees
Row_Number function
Please note : If ORDER BY clause is not specified you will get the following error
The function 'ROW_NUMBER' must have an OVER clause with ORDER BY
Use case for Row_Number function : Deleting all duplicate rows except one from a sql
server table.
Discussed in detail in Part 4 of SQL Server Interview Questions and Answers video series.
138
For example : If you have 2 rows at rank 1 and you have 5 rows in total.
RANK() returns - 1, 1, 3, 4, 5
DENSE_RANK returns - 1, 1, 2, 3, 4
Syntax :
RANK() OVER (ORDER BY Col1, Col2, ...)
DENSE_RANK() OVER (ORDER BY Col1, Col2, ...)
Example : We will use the following Employees table for the examples in this video
Go
DENSE_RANK() on the other hand will not skip ranks if there is a tie. The first 2 rows get
rank 1. Third row gets rank 2.
RANK() and DENSE_RANK() functions with PARTITION BY clause : Notice when the
partition changes from Female to Male Rank is reset to 1
FROM Employees
Use case for RANK and DENSE_RANK functions : Both these functions can be used to
find Nth highest salary. However, which function to use depends on what you want to do
when there is a tie. Let me explain with an example.
If there are 2 employees with the FIRST highest salary, there are 2 different business
cases
• If your business case is, not to produce any result for the SECOND highest salary, then
use RANK function
• If your business case is to return the next Salary after the tied rows as the SECOND
highest Salary, then use DENSE_RANK function
Since we have 2 Employees with the FIRST highest salary. Rank() function will not return
any rows for the SECOND highest Salary.
WITH Result AS
(
SELECT Salary, RANK() OVER (ORDER BY Salary DESC) AS Salary_Rank
FROM Employees
)
SELECT TOP 1 Salary FROM Result WHERE Salary_Rank = 2
Though we have 2 Employees with the FIRST highest salary. Dense_Rank() function
returns, the next Salary after the tied rows as the SECOND highest Salary
WITH Result AS
(
SELECT Salary, DENSE_RANK() OVER (ORDER BY Salary DESC) AS Salary_Rank
FROM Employees
)
SELECT TOP 1 Salary FROM Result WHERE Salary_Rank = 2
141
You can also use RANK and DENSE_RANK functions to find the Nth highest Salary among
Male or Female employee groups. The following query finds the 3rd highest salary amount
paid among the Female employees group
WITH Result AS
(
SELECT Salary, Gender,
DENSE_RANK() OVER (PARTITION BY Gender ORDER BY Salary DESC)
AS Salary_Rank
FROM Employees
)
SELECT TOP 1 Salary FROM Result WHERE Salary_Rank = 3
AND Gender = 'Female'
• Returns an increasing integer value starting at 1 based on the ordering of rows imposed by
the ORDER BY clause (if there are no ties)
• ORDER BY clause is required
• PARTITION BY clause is optional
• When the data is partitioned, the integer value is reset to 1 when the partition changes
We will use the following Employees table for the examples in this video
Notice that no two employees in the table have the same salary. So all the 3 functions
RANK, DENSE_RANK and ROW_NUMBER produce the same increasing integer value
when ordered by Salary column.
You will only see the difference when there ties (duplicate values in the column used in the
ORDER BY clause).
To do this
First delete existing data from the Employees table
DELETE FROM Employees
Notice 3 employees have the same salary 8000. When you execute the following query you
can clearly see the difference between RANK, DENSE_RANK and ROW_NUMBER
functions.
• ROW_NUMBER : Returns an increasing unique number for each row starting at 1, even if
there are duplicates.
• RANK : Returns an increasing unique number for each row starting at 1. When there are
duplicates, same rank is assigned to all the duplicate rows, but the next row after the
duplicate rows will have the rank it would have been assigned if there had been no
duplicates. So RANK function skips rankings if there are duplicates.
• DENSE_RANK : Returns an increasing unique number for each row starting at 1. When
there are duplicates, same rank is assigned to all the duplicate rows but the
DENSE_RANK function will not skip any ranks. This means the next row after the
duplicate rows will have the next rank in the sequence
(
Id int primary key,
Name nvarchar(50),
Gender nvarchar(10),
Salary int
)
Go
FROM Employees
146
So when computing running total, it is better to use a column that has unique data in the
ORDER BY clause
We will use the following Employees table for the examples in this video.
147
NTILE function without PARTITION BY clause : Divides the 10 rows into 3 groups. 4
rows in first group, 3 rows in the 2nd & 3rd group.
What if the specified number of groups is GREATER THAN the number of rows
NTILE function will try to create as many groups as possible with one row in each group.
With 10 rows in the table, NTILE(11) will create 10 groups with 1 row in each group.
NTILE function with PARTITION BY clause : When the data is partitioned, NTILE function
creates the specified number of groups with in each partition.
The following query partitions the data into 2 partitions (Male & Female). NTILE(3) creates 3
groups in each of the partitions.
• Lead function is used to access subsequent row data along with current row data
• Lag function is used to access previous row data along with current row data
• ORDER BY clause is required
• PARTITION BY clause is optional
Syntax
LEAD(Column_Name, Offset, Default_Value) OVER (ORDER BY Col1, Col2, ...)
LAG(Column_Name, Offset, Default_Value) OVER (ORDER BY Col1, Col2, ...)
Lead and Lag functions example WITHOUT partitions : This example Leads 2 rows and
Lags 1 row from the current row.
• When you are on the first row, LEAD(Salary, 2, -1) allows you to move forward 2 rows and
retrieve the salary from the 3rd row.
• When you are on the first row, LAG(Salary, 1, -1) allows us to move backward 1 row.
Since there no rows beyond row 1, Lag function in this case returns the default value -1.
• When you are on the last row, LEAD(Salary, 2, -1) allows you to move forward 2 rows.
Since there no rows beyond the last row 1, Lead function in this case returns the default
value -1.
• When you are on the last row, LAG(Salary, 1, -1) allows us to move backward 1 row and
retrieve the salary from the previous row.
Lead and Lag functions example WITH partitions : Notice that in this example, Lead and
Lag functions return default value if the number of rows to lead or lag goes beyond first row
or last row in the partition.
FIRST_VALUE function
FROM Employees
OVER Clause defines the partitioning and ordering of a rows (i.e a window) for the above
functions to operate on. Hence these functions are called window functions. The OVER
clause accepts the following three arguments to define a window for these functions to
operate on.
Compute average salary and display it against every employee row as shown below.
154
As you can see from the result below, the above query does not produce the overall salary
average. It produces the average of the current row and the rows preceeding the current
row. This is because, the default value of ROWS or RANGE clause (RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) is applied.
To fix this, provide an explicit value for ROWS or RANGE clause as shown below. ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING tells the window
function to operate on the set of rows starting from the first row in the partition to the last
row in the partition.
The same result can also be achieved by using RANGE BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING
The following query can be used if you want to compute the average salary of
1. The current row
2. One row PRECEDING the current row and
3. One row FOLLOWING the current row
Let us understand the difference with an example. We will use the following Employees
table in this demo.
(
Id int primary key,
Name nvarchar(50),
Salary int
)
Go
Calculate the running total of Salary and display it against every employee row
157
The following query calculates the running total. We have not specified an explicit value for
ROWS or RANGE clause.
SELECT Name, Salary,
SUM(Salary) OVER(ORDER BY Salary) AS RunningTotal
FROM Employees
This means the above query can be re-written using an explicit value for ROWS or RANGE
clause as shown below.
SELECT Name, Salary,
SUM(Salary) OVER(ORDER BY Salary
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
RunningTotal
FROM Employees
We can also achieve the same result, by replacing RANGE with ROWS
SELECT Name, Salary,
SUM(Salary) OVER(ORDER BY Salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
RunningTotal
FROM Employees
Execute the following UPDATE script to introduce duplicate values in the Salary column
Update Employees set Salary = 1000 where Id = 2
Update Employees set Salary = 3000 where Id = 4
Go
Now execute the following query. Notice that we get the running total as expected.
SELECT Name, Salary,
SUM(Salary) OVER(ORDER BY Salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
RunningTotal
FROM Employees
You get the following result when you execute the above query. Notice we don't get the
running total as expected.
159
So, the main difference between ROWS and RANGE is in the way duplicate rows are
treated. ROWS treat duplicates as distinct values, where as RANGE treats them as a single
entity.
All together side by side. The following query shows how running total changes
1. When no value is specified for ROWS or RANGE clause
2. When RANGE clause is used explicitly with it's default value
3. When ROWS clause is used instead of RANGE clause